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In this Issue 

Computers controlling instruments isn t a new story anymore, nor is built-in 
instrument intelligence. But few who don t use them know how very intelligent 
some of our electronic instalments have become. Our cover subject, the HP 
3562A Dynamic Signal Analyzer, is an example of this trend. It's a measuring 
instrument, designed for low-frequency network and spectrum analysis, but 
you can use ft to do computer aided design (CAD) without a computer it 
not only measures and analyzes, but also synthesizes and models, all by 
itself — no mainframe, no workstation, no PC, With just this instrument, you 
can do a whole linear network design: 
Decide what shape you want the network response to have and synthesize it using the HP 
3562A's built-in synthesis capabilities. The instrument wilJ fit a rational polynomial to the response 
curve and compute the roots of the denominator and numerator polynomials — that is, the poles 
and zeros of the response. From these you can choose a network topology and component 
values. 

* Build the prototype and measure its response in any of the analyzer's three measurement 
modes. Compare it with the response you wanted. 

a Extract the prototype's actual poles and zeros and modify the design to get closer to the desired 
result. 

* Repeat as necessary. 

On pages 4 to 35 of this issue, the HP 3562A's designers explain how it works and what it can 
do. Its basic functions are described in the article on page 4. and the details of its measurement 
modes are in the article on page 17, Unusual is the analyzers digital demodulation capability. 
Give the analyzer a modulated carrier, and if the modulation is within its frequency range, it can 
extract and analyze it. It doesn't matter if you don't know the carrier frequency or whether the 
modulation is amplitude, phase, or a combination (as long as the two modulating waveforms don't 
have overlapping spectra). The article on page 33 reveals the theory of operation of the curve 
fitter and tells how so much computing power was made to fit in the available memory. The article 
on page 25 walks us through several examples of the use of the HP 3562A to solve realistic 
analysis and design problems. 

The HP 3000 Series 70 Business Computer is the most powerful of the pre-HP-Predsion- 
Architecture HP 3000s. The objective for its design was to upgrade the Series 68s performance 
significantly in a short time Measurement, modeling, and verification were used to identify and 
evaluate possible design changes. The paper on page 38 describes the methods and how they 
were applied to the Series 70s cache memory subsystem, its major improvement. According to 
the authors, the design of a cache provides a severe test for any estimation methodology. They 
feel they have advanced the state of the art in cache measurement and prediction. 



-R P. Doian 

Cover 

A principal use of the HP 3562A Analyzer is the design of servo systems such as the head 
positioning mechanisms for disc drives. 



What's Ahead 

The February issue will feature several articles on the design of a new famify of fiber optic test 
instruments. The family includes three LED sources, an optical power meter with a choice of two 
optical heads, two optical attenuators, and an optical switch. Microwave transistor measurements 
using a special fixture and de-embedding techniques will also be treated. 
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Low-Frequency Analyzer Combines 
Measurement Capability with Modeling 
and Analysis Tools 

HP's next-generation two-channel FFT analyzer can be 
used to model a measured network In a manner that 
simplifies further design. 

by Edward S. Atkinson, Gay lord L. Wahl, Jr., Michael L. Hall, Eric J. Wicklund, and Steven K. Peterson 



THE NAME FFT ANALYZER has been applied to a 
category of signal analysis instruments because their 
dominant [In some cases, their only) analysis feature 
has been (he calculation of the fast Fourier transform of 
the input signals for spectrum and network response mea- 
surements. These analyzers produce an estimate of a net- 
work's frequency response function at equally spaced fre- 
quency intervals. 

These FFT analyzers have justified their use for low-tre- 
quency signal analysis mainly because of their higher mea- 
surement speed when compared to conventional swept fre- 
quency response analyzers. However, proponents of swept 
sine analyzers are quick to poinl out their instruments 1 
wider dynamic range and ability [u characterize non- 
linearities in a network. Proponents of 1/3 -octave analyzers 
jump into the fray by extolling the advantages of logarith- 



mically spaced spectrum analysis. These debates about 
which is the 4 'best' : analyzer can never really be won, since, 
in reality, all of these measurement techniques have their 
advantages depending on the application. 

This fact w r as recognized in the design of the HP 3562 A 
Dynamic Signal Analyzer (Fig. 1), which provides three 
different measurement techniques for low- frequency anal- 
ysis within one instrument: 

■ FFT -based, linear resolution spectrum and network 
analysis 

■ Log resolution spectrum and network analysis 

■ Swept sine network analysis. 

These measurement techniques use advanced digital signal 
processing a Igorilhms that result in more accurate and more 
repeatable measurements than previously available with 
conventional analog implementations. 




Fig. 1. The HP 3562 A Dynamic 
Signal Analyzer performs fast, ac- 
curate network, spectrum, and 
waveform measurements from dc 
to 100 kHz, Measurements in- 
clude pGwer sp e ctrum . his to gram . 
frequency response, and cross - 
correlation. These can he per- 
formed tn real time or on stored 
data. Built-in analysis and model- 
ing capabilities can derive poles 
and zeros from measured fre- 
quency responses or construct 
phase and magnitude responses 
from user-supplied models Direct 
control of external digital plotters 
and disc dn ves allows easy gener- 
ation of hard copy and storage of 
measurement setups and data 
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The HP 3562 A has two input channels, each having an 
overall frequency range from 64 jiHz to 100 kHz and a 
dynamic range of 80 dB. 

Measurement Modes 

Linear Resolution. In the linear resolution (FFT) mode, the 
HP 3562 broad range of time, frequency, and 

amplitude domain measurements: 

■ Frequency domain — linear spectrum, power spectrum, 
cross spectrum, frequency response, and coherence func- 
tions 

■ Time domain — averaged time records, autocorrelation, 
cross-correlation, and impulse response functions 

■ Amplilude domain — histogram, probability density 
function (PDF), and cumulative distribution function 
[CDFj. 

A special feature in the linear resolution mode is the 
ability to perform AM, FM, or PM demodulation on each 
input channel. Traditionally, demodulation has been per- 
formed using separate analog demodulators whose outputs 
are connected to the test instrument. The digital demodula- 
tion technique in the HP 35 62 A has the advantages of 
higher accuracy and greater dynamic range than analog 
demodulation, and it is built into the test instrument, 

The type of demodulation is independently selectable 
for each channel For example, a frequency response mea- 
surement can be performed using AM demodulation on 
Channel 1 and PM demodulation on Channel 2, As another 
example, a two-channel power spectrum measurement can 
beset up in which AM demodulation is specified for Chan- 
nel 1 and no demodulation is selected for Channel 2. One 
can easily perform two types of demodulation [e.g., AM 
ami I Mi simultaneously cm the same input signal by con- 
necting it to both channels, 

A preview mode allows the user to view (and modify) 
the modulated input signal before the demodulation pro* 
is invoked There is also a special demod-polar display 
that shows the locus of the carrier vector as its ample 
and phase vary for AM and FN I signals. This display is 
very useful for observing possible interrelationships be- 
tween AM and PM signals, 

Log Resolution, In many applications, the network or sig- 
nals of interest are best characterized in terms of 
logarithmic or proportional frequency resolution, The HP 
3562A provides a true proportional resolution measure- 
ment — not simply a linear resolution measurement dis- 
played on a log frequency scale. 

In this mode, the user can make the following frequency 
domain measurements: power spectrum, cross spectrum, 
frequency response, and coherence functions, For log res- 
olution measurements, the average quantity is always a 
power quantity. Stable. exponentiaLan d peak hold averag- 
ing modes are available and offer the same benefits as in 
the linear resolution mode. 

The user can select a frequency span from one to five 
decades with a fixed resolution of 80 logarithmically 
spaced spectral lines per decade. For example, if the start 
[rccjiMiiti v is set to I i 1/ rind tfafi Span Efi B6fl to EiV8 del fldeS 
the frequency range of the measurement will be from 1 Hz 
to 100 kHz with 400 lines of resolution. Both random noise 
and fixed sine source outputs are available in this mode. 



Swept Sine, When swept sine mode is selected, the HP 
3562A is transformed into an enhanced swept frequency 
response analyzer. In this mode the user can make the 
following frequency domain measurements: power spec- 
trum, cross spectrum, frequency response, and coherence 
functions. The user can select either linear or log sweep 
wiih a full range of sweep controls, including sweep up, 
sweep down, sweep hold, and manual sweep. During the 
ep, the HP 3562 A's built-in source outputs a phase-con- 
tinuous, stepped sine wave across the selected frequency 
span and a single-point Fourier transform is performed on 
the input signal. 

There are four key setup parameters associated with the 
sweep for which the user can either set fixed values or 
specify an automatic mode of operation. These parameters 
are input range, integration time, source output gain, and 
frequency resolution. 

By judiciously selecting these automatic sweep features, 
the user can perform a measurement in which the sweep 
adapts dynamically to meet the requirements of the device 
under test. 

A discussion about the technical basis behind the HP 
3562A*s measurement modes and digital demodulation 
capability is given in the article on page 17, 

Advanced Data Acquisition Features 

Normally, measurements are made on-line using the data 
currently being acquired from the input channels. How- 
ever, the HP 3562A provides two modes in which data can 
be acquired and Lhen processed off-line at a later time. 
Time Capture. In this mode, up to ten time records (20,480 
sample points) from either channel can be stored in a single 
buffer. This data call be acquired in real time for any fre- 
quency span up to H)() kHz. The user can display a com- 
pressed version of I he entire time buffer or any portion of 
it in I he time or frequency domain. Furthermore, the time 
capture buffer can be used as an input source for any of 
the single-channel measurements available in linear reso- 
lution mode. 

If a measurement is set up with the same frequency range 
as the time capture data, then up to ten averages can be 
done. At the other extreme, a measurement with one aver- 
age can be made with a frequency span llial is a factor of 
10 narrower than the original time capture frequency range. 
i in example, if a time capture is performed with a span 
from dc; to It) kHz, the user can perform a power spectrum 
aremeut on this data within a 1-kHz span centered 
anywhere I nun am Hz to 9.5 kHz. Several display options 
(e.g., expansion, scrolling, etc.) are available for manipulat- 
ing time capture data. 

Time Throughput. The HP 3562 A is special among low-fre- 
quency analyzers in offering the capability to throughput 
time data directly from the input channel (s) to an external 
disc drive. No external HP-IB (IEEE 4B8/IEC 625) controller 
is needed for this operation. 

The instrument can throughput data to an HPCS/80 hard 
disc drive (e.g,, the HP 7945) at a real-time measurement 
span of 10 kHz inr single-channel operation and 5 kHz for 
dual-channel operation. The throughput data session can 
be used as an input source for both linear resolution (in- 
cluding demodulation] and log resolution measurements, 
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As in the time capture mode, zoomed measurements can 
be made on real-time throughput files* However, much 
narrower zoom spans are possible since a throughput file 
can be much larger than the internal time capture buffer. 

The user can conveniently view the data in the 
throughput file by stepping through the file one record at 
a time. Measurements do nol have to start at the beginning 
of the file— an oft set into the file can be spec i lied for the 
measurement start point. 

Modeling and Data Analysis Capabilities 

Given the extensive measurement capability described 
above, the design of the HP 3562A could have stopped 
there — as a versatile test instrument. However, combining 
this measurement performance with equally extensive 
built-in analysis and design capabilities turns the product 
into a one-instrument solution for many applications, 
Flexible Display Formats. Generally, the result of a mea- 
surement process is the data trace on the screen. Graphical 
analysis is simplified by the HP 3562A's full complement 
of display formats. The independent X and Y markers [both 
single and band) and the special harmonic and sideband 
markers make it easy to focus on important regions of data. 
The annotated display and the special marker functions 
(e.g., power, slope, average value, THD, etc.) allow simple 
quantitative analyses to be performed directly on the dis- 
played data. 

Waveform Calculator. The instrument includes a lull com- 
plement of waveform math functions including add, sub- 
tract, multiply, divide, square root, integrate, differentiate, 
logarithm, exponent ial, and FFT, The operands (either real 
or complex) for the math functions can be the displayed 
trace{s). saved data traces, or user-entered constants. A 
powerful automath capability allows the user to specify a 
sequence of math operations to be performed on any of the 
standard data traces while the measurement is in progress. 
The user's custom measurement result can then be dis- 
played at any time by simply pressing the automath softkey, 
which can be relabeled to indicate the name of the function. 
Some examples include group delay* Hilbert transform, 
open-loop response, and coherent output power (COP). Auto- 
math is discussed in more detail later in this article. 
S-Plane Calculator. The HP 3562A can synthesize fre- 
quency response functions from either pole-zero, pole-res- 
idue, or polynomial coefficient tables. In addition, the user 
can convert from one synthesis table format to another with 
a single keystroke. As an example, a designer can enter a 
model of a filter in terms of numerator and denominator 
polynomial coefficients and then convert to pole-zero for- 
mat to find the roots of the polynomials. This s-plane cal- 
culator is a powerful network modeling tool and it exists 
within the same instrument that will be used to test and 
analyze the actual network — providing on-screen compari- 
son of predicted and measured results. See the article on 
page 25 for more details about the HP 3562A*s synthesis 
capabilities and some design examples. 
Curve Fitter. One of the most powerful analysis features 
in the HP 3562A is a multidegree-of-freedom 4 frequency- 
domain curve fitter for extracting the poles and zeros of a 
network from its measured frequency response function. 
The curve fitter can fit up to 40 poles and 40 zeros simul- 



taneously for the entire response function or any portion 
defined by the display markers. In addition, the table of 
poles and zeros generated by I he curve fit can be transferred 
to the synthesis table — a direct link between the instru- 
ment's modeling and analysis I'ealures. A curve-fil al- 
gorithm that can only fit clean data is of little practical use 
since most real-world measurements are contaminated by 
SOB*© amount of noise. The HP 35fi2A curve fitter removes 
biases caused by measurement noise, resulting in a greatly 
reduced noise sensitivity and, therefore, more accurate es- 
timates of the pole and zero locations. See the article on 
page 33 for more details. 

Hardware Design 

A block diagram of the HP 3562A's hardware system is 
shown in Fig. 2. 

Input Amplifiers. Spectrum analysis for mechanical and 
servo applications requires that the analyzer's input 
amplifiers provide ground isolation (usually to reject dc 
and ac power-line frequency signals). Ground isolation at 
low frequencies can be achieved by running the input 
amplifiers on floating power supplies with the supply 
grounds driven by the shields of the input cables. This 
floating ground can then be used to drive a guard shield 
that encloses the floating amplifier's circuitry. This design 
is effective at rejecting dc common-mode signals and pro- 
vides a large common-mode range, but is limited in its 
ability to reject higher-frequency common- mode signals. 

Fig. 3 illustrates the problem. Common-mode signal V rm 
is dropped across the voltage divider formed by the source 
impedance Z s (which could be the resistance of the input 
cable) and the capacitance C between the floating ground 
and chassis ground. The voltage drop across Z ft is then 
measured by the analyzer just as if it were any other differ- 
ential-mode signal V dm . It can be seen that things get worse 
with increasing frequency. The fundamental problem here 
is that truly floating I he amplifier is not practical since C 
cannot be made arbitrarily small in practice and the inputs 
are not balanced with respect to the chassis. In addition, 
since the source impedances to the two input terminals 
may not be quite the same, it is desirable to minimize the 
common-mode input currents by providing a high-imped- 
ance path for both signal inputs, 

An effective way of providing high impedance for both 
inputs and good balance is to use a true differential 
amplifier (Fig. 4). For a perfect amplifier with a gain of - 1 
and all resistors shown of equal value, the common-mode 
rejection (CMR] will be infinite (neglecting parasitic com- 
ponents). An added benefit is that common-mode rejection 
is achieved without requiring floating power supplies, 
which would be prohibitively expensive in a multichannel 
instrument. The input amplifiers in the HP 3562A are based 
on this true differential amplifier topology with circuitry 
to provide calibration of CMR and bootstrap circuits to 
increase common -mode range and reduce distortion. 

The two input paths in each of the HP 3 5 62 A 's amplifiers 
contain programmable [0 dB, 20 dB, or 40 dB] attenuators, 
buffer amplifiers, and other circuits that contribute to im- 
balances in the two signal paths. To remove these imbal- 
ances (and improve CMR), a programmable circuit is used 
to adjust the gain of the low signal path. As shown in Fig, 
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Applications 



The HP 3562A covers a broad range of appi teat ions in control 
systems, mechanical syst- ~equency electronics, and 

acoustics 

Design of Closed -Loop Control Systems 

■:-nent process >s shown =nFrg 1 A particu- 
lar apphcaifon of this development process is the design of 
closed-loop control systems. The design activity involves seed- 
ing components that, when connected in a specified topology, 
produce system performance consistent with the design specifi- 
cation. Closeiy related to this activity is the modeling activity — 
developing equations to predict system performance These 
equations are often transfer function relationships expressed as 
ratios of polynomials m the complex frequency domain. 

The destgn and modeling activities are Simplified using the 
HP 3562 A because of its built-in s-piane calculator and transfer 
function synthesis capability (see article on page 25) A designer 
can enter system information directly into the analyzer in pole- 
zero, pole- residue or polynomial format. Conversion between 
these formats is possible. Once the system information is entered, 
the analyzer will compute and display the frequency response 
of the system The result can be displayed in gam phase form, 
as a Nyqwst diagram, or as a Nichols chart. The designer can 
then easily perform other calculations on the synthesized 
waveform For example, the inverse transform of the frequency 
response can be displayed with one keystroke to identify the 
impulse response of the system By pressing another key. this 
waveform can then be integrated to simulate the step response 

The test activity involves making measurements to determine 
actual system performance Designers requiring both FFT and 
swept sine measurements can use the HP 3562 A to perform both 
functions The HP 3562A provides net only frequency domain test 
capabilities, but also time domain test functions. Its waveform re- 
cording features allow the control system designer to measure time 
domain functions such as impulse, step, and ramp responses 

The analysis activity studies measured performance In cases 
where desired performance, such as gain-phase margin, over- 
shoot, or undershoot has been estimated, analysis consists of 
comparing measured results against the expectation. This com- 
parison can be as simple as reading marker values or using the 




Fig. 1 . Development process flowchart. 



HP 3562A's front and back display format- 
Often designers are constrained to measure control systems 
under operating conditions with the loop closed Although the 
closed -loop response is measured, the open-loop response is 
desired. The HP 3562A provides an analysis function that can 
compute the open-loop response directly from closed4oop data 
with just one keystroke 

An extremely powerful analysis tool for the control system de- 
signer is the HP 3562A's curve fitter (see article on page 33), 
which extracts system poles and zeros from measured frequency 
response data for comparison against expected values, Further- 
more, the curve fitter provides the link between the analyses and 
model activities m the development process, because of its ability 
to pass curve-fit data to the transfer function synthesis function 

Vibration Analysis 
Vibration measurements on rotating machinery are extremely 

important In many cases, the vibration signals of interest are 
modulation signals on a carrier frequency which corresponds to 
the shaft rotation speed As an example, a oroKen tooth on a 
rotating gear can result in amplitude-modulated vibration signals. 
In a belt-driven pulley system, the dynamic stretching of the belt 
can produce frequency-modulated or phase-modulated signals 

The HP 3562A can perform AM, FM, or PM demodulation on 
these vibration signals even if the carrier frequency (shaft speed) 
is unknown 

Electronic Filter Design 

All of the major modeling, test, and analysis features can come 
into play m solving filter design problems (see article on page 
25 for examples). The initial filter model (i.e., transfer function) 
can be entered into the synthesis table in the form most familiar 
to the designer and then the model s frequency response function 
can be synthesized Log resolution measurements or log-sine 
sweeps are appropriate for measuring broadband filter networks. 
High-Q notch filters can be accurately characterized using 
zoomed linear-resolution measurements or narrow sine sweeps 
By applying the HP 3562A's curve fitter to these linear or log 
frequency measurements, the poles and zeros of the actual filter 
can be extracted for comparison with the model 

Audio and Acoustics 

Noise identification and control are becoming more important 
in many environments With its two input channels, the HP 3562A 
can make acoustic intensity measurements to determine sound 

intensity and direction. In the audio field, the instrument's digital 
demodulation capability has proven to be very effective in analyz- 
ing modulation distortion in phonograph pickup cartridges and 
loudspeaker systems 

Bibliography 
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5, this is accomplished by Increasing the g;)in through the 
low path by raising the value of R (i and then driving the 
grounded end with a signal that is -KxB, where K is the 
gain of a multiplying digital-to-analog converter (DAC). 
Increasing the DAC gain reduces low-path gain to the point, 
where CMR is maximized, 

An algorithm for optimum adjustment of the DAC was 
developed that requires only two data points to m« t hi I gain 



correctly- A common-mode signal is internally connected 
to both differential inputs and the DAC is set to each of its 
extreme values. The relationship between the two resulting 
measured signal levels is then used to interpolate to obtain 
the optimum DAC gain setting. The resulting CMR is better 
than -BO dB up to 66 Hz and - 65 dB up to 500 Hz, 

One- megohm- inpu I buffer amplifiers are inserted be- 
tween the inputs and the differential amplifier stage to 
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HP 1345 A 

Digital 
Display 



Fig, 2. Hardware block diagram of the HP 3562 A. 



provide impedance conversion. The common-mode range 
of these amplifiers is increased and distortion is reduced 
by adding bootstrap amplifiers that drive the buffer 
amplifier power supplies (Fig£ d)> These bootstrap 
amplifiers are unity-gain stages with ±dc: offsets added to 
the outputs so that the buffer amplifier supplies lollow I he 
ac input voltage. Thus, the input buffer amplifiers only 
need to provide correction to the signals present within 
their supply rails. In addition, the op amps used in these 
amplifiers are therefore able to work over a common-mode 



Guard Shield 




^/Chassis Ground 
Fig. 3. A floating input amplifier. 



range that would otherwise be beyond their maximum sup- 
ply rail specification. The resulting common-mode range 
for the HP 3562 A is ±18V on the most -sensitive range [ - 51 
dBV) and distortion is suppressed more than tt(J dB below T 
full scale. 

Digital Circuitry. For certain analyzer frequency spans , the 
data co i lection time (time record length) will be shorter 
than the data processing time. As a result, part of the injml 
data will be missed while the previous time record is bein^ 
processed. In this case, the measurement is said to be not 
real-time. As the frequency span is decreased, the time 
record length is increased, and a span is reached where 
the corresponding data collection time equals the data pro- 
cessing time. This frequency span is called the real-time 
bandwidth. It is at this span that time data is contiguous 




Fig, 4, True differential amplifier. 
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Fig- 5. Input amplifier configura- 
tion used in the HP 3562 A 



and all input data is processed. Dei reasing the span further 
results in data collection times longer than the data process- 
ing time. These measurements are said to be real-time. 

Designing for a real-time bandwidth of 10 kHz [single 
channel] requires data processing times of less than 80 ms. 
It was apparent that a single processor could not perform 
the many computational tasks (fas! Fourier transforms and 
floating- point calculations In particular] that are required 
within this amount of tim^. Consequently, the choice was 
made to develop separate hardware processors for these 
computational uei turtle real-time bandwidth of the 

HP 3562A Ls possible because of the use of multiple 
hardware processors and a dual-bus digital architecture 
(see right half of Fig. 2). 

Before pipelined parallel data processing Can begin/the 
input signals must be digitized and filtered to the current 
frequency span. Signals enter the analyzer and are < un- 
ditioned by the programmable-gain, differentia I -input 
amplifiers described above. Low- pass anti-aliasing filters 
with a cutoff frequency of 100 kHz attenuate unwanted 
frequency components in the conditioned signal. The sig- 
nal is thru converted It) digital data by ana log- to-digital 
converters (ADCs) using the same design that was used in 
the earlier HP 3561 A. 1 The data is then filtered to appro- 
priate frequency spans by custom digital filters. Separate 
digital filters exist for each input channel 

The digital filters provide the gateways for data into the 
digital processing portion (Fig. 2). The architecture in- 
cludes a system central processing unit [CPU), an FFT pro- 
cessor, a floating-point processor (FPP). a shared memory 
resource (global data RAM), an interface to the display, 
and two separate digital buses that allow simultaneous 
communication and data transfer* 

The CPU Controls a data processing pipeline by issuing 
commands, reading status, and servicing interrupts for the 
digital filter controller, FFT processor. FPP, and display 



interface. The CPU also services the digital source and 
front-end interface, local oscillator, keyboard, and HP-IB. 
The CPU design consists of an 8-MHz MC68000 micropro- 
cessor, a 16K> 16-bit ROM, a 32K* 16-bit RAM of which 
8Kx]fi bits is nonvolatile, a timer, HP-IB control, power- 
up, down circuitry, and bus interfaces. The CPU communi- 
cates with each of the hardware processors as memory 
mapped I/O devices over the system bus, which is im- 
plemented as a subset of the BR000 bus 

The other bus in the architecture, the global bus, provides 
a path to the global data RAM. This memory is used by all 
of the hardware processors tot data storage. After the ihmi 
has been digitized and filtered to the current frequrm . 
spam the digital filter stores it into the global data RAM. 
The CPU is signaled over the system bus when a block of 
data has been transferred, and as a consequence, the CPU 
instructs the FFT processor to perform a time-to- frequency 
transformation of the data. The FFT processor accesses the 
global data RAM, transforms the data, stores the result back 
in In memory, and then signals the CPU. The CPU now 
commands the FPP to perform appropriate floating-point 
computations. The FPP fetches operands from the global 
data RAM. does the required calculations, and then stores 
the results back into memory. When finished, the FPP sig- 
nals the CPl t. The CPU then reads the data, and performs 
coordinate transforms on it as well as formal < onversions 
in preparation for display. The data is again stored into 
the global data RAM. The CPU finally instructs the display 
interface to transfer the data to the display, (The display 
used is the HP 134 5 A Digital Display , which requires digital 
commands and data to produce vector images.) 

The above description describes one data block as it is 
transferred from digital filter to display. However, the HP 
3562A Analyzer does not process one block of data at a 
time, it processes four, That is, while display conversions 
of block N-3 are being done by the CPU, block N-2 is 
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being operated on by the FPP, block N — 1 is being trans- 
formed by the FFT processor, and block N is being tillered 
by the digital filter. Simultaneous operation of .ill proces- 
sors provides the computational speed necessary lo pro- 
duce a large real-lime bandwldlh of it) kHz. 

As a consequence of the (lata processing arch i lecture, 
the global memory system had two major requirements. 
The firsl was thai multiple processors needed access to the 
memory. This required an arbiter, a decision maker, lo 
monitor memory requests and allocate memory accesses 
(grants) based on a linear priority schedule, Devices are 
serviced according to their relative importance, indepen- 
dently of how long they have been requesting service. Each 
requesting device has associated w r ith it a memory request 
signal and a memory grant signal. When a device needs 
access to the memory, it asserts its memory request. The 
global data RAM prioritizes all requests and issues a mem- 
ory grant to the highest-priority device. That device is then 
allocated one bus cycle. 

The second major requirement placed on the global mem- 
ory system was minimal memory cycle lime to satisfy in- 
strument real-time bandwidth needs. Computer simula- 
tions were done to model the effects of memory cycle time 
on real-time bandwidth. It was concluded that cycle times 
less than 500 ns would satisfy the instrument's real-time 
bandwidth requirements- 

To optimize memory cycle time, the timing for the global 
memory system is constructed from a digital delay-line 
oscillator. Three digital delay lines are connected in series, 
with the output of the third delay line inverted and con- 
nected to the input of the first delay line. Individual liming 
signals were customized for the li4Kx Hi-hit dynamic 
RAMs and arbiter using combinational logic operating on 
signals from 10-ns delay taps on the delay line oscillator. 
As implemented, global memory cycles are available every 
470 ris. 

The FFT processor can perform both forward and inverse 
fast Fourier transforms and windowing- It is designed 
around a TMS320 microprocessor. Although an on-board 
ROM allows the TMS320 to execute independently, the 
FFT processor is slaved to the CPU- Commands are written 
to the FFT processor over the system bus, and the FFT 
processor accesses (he global data RAM for the data to be 
transformed and any user-defined window information, 
Operations are executed, and the CPU is interrupted upon 
completion. The FFT processor computes a 1024-point t 
radix-4 complex FFT in 45 ms. 

The FPP is constructed from six AM29U3 bit-slice micro- 
processors using conventional bit-slice design techniques. 
It can operate on three different data formats; two's comple- 
ment integer (16 bit), single-precision floating-point (32 
bit], or double- precision floating-point (64 bit). Besides ad- 
ditiom subtraction, multiplication, and division, the FPP 
can perform 81 customized operations. A list of FPP in- 
structions, called a command stack, is stored in the global 
data RAM by the CPU. The list consists of a 32-bit command 
word [add, subtract, et cetera), the number of entries in the 
data block to be operated om constants to indicate if the 
data block is real or complex, the beginning address of the 
data block in the global data RAM t and the destination 
address for the results. The FPP is then addressed bv the 



CPU and the starting address oi the command stack is given. 
The FPP executes the command stack uini Interrupts the 
CPU upon completion. 

The digital filter assembly consists of two printed circuil 
cards. One card t the digital filter board, contains two sets 
(one per input channel) of custom integrated circuits de- 
signed for digital filtering. These integrated circuits were 
leveraged from the design of the HP 356 1 A. z The other 
card, the digital filter controller board, supplies local infor- 
mation to the digital filter board and contains the interface 
to the system bus. Digitized data is supplied to the digital 
lifter from the A DCs at a 10,24-MHz rate. Upon command 
from the CPU, the digital filter can operate on the data in 
different ways. It can pass the data directly to the global 
data RAM (as is required if the user wants to view time 
data), it can frequency translate the data by mixing it with 
the digital equivalent of the user-specified center fre- 
quency, filter unwanted image frequencies (required in a 
zoom operation which enables the full resolution of the 
analyzer to be enncentrated in a less than full span measure- 
ment), and store to the global data RAM. Or it can filter 
the data without any frequency translation and store to the 
global data RAM [required in the baseband mode where 
the start frequency is Hz and the span is less than full 
span), The data can be operated on simultaneously in dif- 
ferent modes, thus providing the analyzer the ability to 
view input data simultaneously with frequency data. The 
design is the most complex digital assembly in the HP 
3562A, and is implemented using a number of programma- 
ble logic devices, direct memory access (DMA) controllers, 
and large-scale- integration counters. 

Throughput. Throughput means acquiring input data, fil- 
tering it, processing trigger delays, and writing the data lo 
the disc as fast as possible. To accomplish this, ihe HP 
35b2A has built-in disc drivers for HP Command Sel 80 
(CS/80) disc drives (e.g., HP 7945 and HP 7914), HP Subset 
fiO (SS/80) disc drives (e.g.. HP 9122 and HP 9133D), and 
other HP disc drives soch as the HP 9121 t HP 9133XV, HP 
82901, and HP 9895. These disc drives can be used for 
general save and recall fnnctinns in addition to throughput. 
The drives most suited for throughput are the CS/80 or 
SS/80 hard disc drives. The software disc drivers are tuned 
for these drives, 

The throughput is accomplished with three internal pro- 
cessors to pipeline the transfer process (see Fig. G). The 
digital filter assembly digitally filters the data for a channel 
into a rotating buffer with enough room for two time rec- 
ords. This allows one time record to be accumulated while 
the other record is moved out of the buffer. When the instru- 
ment is acquiring data in triggered mode, the digital filter 
assembly is putting the data in the rotating buffer even 
before the trigger is received. The data before the trigger is 
needed if a pretrigger delay is specified. Since the start of 
data (trigger point it no trigger delay] can occur anywhere 
in the rotating block, the full time record will not be con- 
tiguous in memory if the start of data is more than halfway 
into the buffer, The first part of the data will be at the end 
of the buffer and the last part of the data will be at the 
beginning of the buffer (i.e.. the data is wrapped), 

The next step is for the FFT processor to move and un- 
wTap the data into another buffer. This buffer is one of a 
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Fig. 6. Throughput data flow dur- 
ing a data acquisition to dtsc stor- 
age. (M processors are running 
simultaneously) 



set of six rotating buffers. The FFT processor is fast enough 
to move an entire time record out of the input block before 
the digital filter assembly overwrites it, except for a 100- 
kHz frequency span, which is faster than real-time anyway. 
The CPU moves the contiguous data hi lime record blocks 
through the HP-IB to the disc drive. To cut down on the 
overhead, the entire throughput operation is set up as 
a single disc write command. The six buffers help to smooth 
the nonlinear transfer rate to the disc. 

The limit on the transfer rate is usually the CPU hand- 
shaking data out of the HP-IH chip. The CPU executes a 
tight assembly language loop tu transfer data (about fiO 
kbvtes/s). Most hard discs can receive data taster than this, 
This transfer rate translates Into a 10-kHz real-time fre- 
quency span for a one-channel measurement. 

Since throughput is essentially a real-time operation, the 
digital Site* channels must never slop transferring data to 
memory, or gaps will appear in the data. If both chaiu 
are being throughput to the disc, then the data is interleaved 
mi the? disc in time record increments [Channel 1 followed 
by Channel 2). li inieis are being throughput in 

triggered mode, then both i haunels are Inhered al tin- 
same time regardless of trigger delay, litis allows both 
channels to be synchronized on time record boundaries If 
the user specifies a cross-channel delay of greater than one 
lime record, then an integral number of time records are 
thrown away from the delayed channel rather than wasting 
mm the disc with unused data. Hence, with the same 
number of records through put to the disc in this mode, 
there will be a sequence of the undelayed channel ret < 
followed by zero or more interleaved records (Channel 1 
followed by Channel 2). followed by the remainder of the 
delayed channel records. 

Since the throughput is normally done with one write 
command to the disc* the disc will fill an entire cylinder 
before moving the disc heads to the next cylinder (head 
step]- Each cylinder consists of multiple heads, each on a 
separate surface (i.e., a track). The time to step the head 
one track is not long, If the track has been spared (i.e.. 
because of a bad spot on the tJisc ), the disc head automat- 
ically steps to the repia* ement logical track, which could 
be halfway (or morel across I be disc. This could adversely 
affa I the real-time data rate. To avoid this problem, fin 1 
instrument does not use am logical track that bus been 



spared (replaced) for throughput. There will be a gap in 
the logical address space on the disc, but skipping the 
spared track only requires two head seek times. The spared 
track table can only be read from a CS/80 disc drive. If a 
spared track is in the throughput file used, then one disc 
write transfers data before the spared track, and another 
write is used for data after the spared track. The HP 3562 A 
can skip up to nine spared tracks in each throughput file 
(identified in the throughput header) in this manner, 

Any throughput data in the file can be viewed by the 
user as a time record or used as a source of data for a 
measurement. When performing a measurement of the 
throughput data, the data is input back into the digital 
filters for possible refiltering. This allows data throughput 
at the maximum real-time span (10 kHz, one channel) when 
test timers valuable, and then performing the measurement 
later at a desired lower span (e.g., 2 kHz). Since through put 
data is processed off-line, more flexibility is allowed on 
this data than may be possible with a normal on-line mea- 
surement- Data from anywhere in the throughput file can 
be selected for postprocessing (i.e,, a selected measure- 
ment) by specifying a delay offset when measuring the 
throughput, For a measurement on a real-time throughput 
tile. ( 1 an specify the exact amount of overlap pro 

cessing to be used during averaging, increased flexibility 
is realized by being able to continue .1 measurement from 
an v where in the throughput file, which allows a measure- 
ment on the throughput data in a different order than it 
was acquired. 

Operating System 

The operating system Used by the HP 3 5 62 A can be de- 
scribed as a multitasking system with cooperating process- 
es. The operating system provides task switching, process 
creation, process management, resource protection (alloca- 
tion), and process communication. 

The operating system manages the use of processes. A 
process is a block of code that can be executed. \\\ a mul- 
titasking system, several processes can be executed simul- 
taneously. Since there is no lime slicing (using a timer 

interrupt to split CP1 1 time between processes}, cooperation 

between processes is essential. Tu share CPU resources 
between proi esses, explicit suspends back to the operating 
system musl be performed about every 100 ms. 
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Since our development language [Pascal) is stack 
oriented, there could be a problem with multiple processes. 
Does a process share a stack with a no! her process? How 
do they share it? Because of demands on the CPU RAM 
(i.e*. local data, states, et cetera}, the stack RAM space was 
in short supply, so we chase to implement a hybrid stack 
management system. 

We started implementing a system with only one stack 
instead of a stack for each process. This was accomplished 
by making one restriction upon the software designers. 
Breaks to the operating system (suspends and waits) could 
only be made in the main procedure in a process, w r hleh 
was not unreasonable for our original code expectations, 
Our Pascal development language allocates the procedure 
variables on top of the stack upon procedure entry and 
afterward references these variables with a base address 
register. The stack pointer is not used to reference the vari- 
ables. The top of stack can be moved without affecting the 
access of the variables in the main procedure. This allows 
each process to have its own slack for its main procedure 
that is exactly the right size, and also a [possibly separate) 
stack for procedure calling. When a process ends, a call 
back to the operating system records the stack segment for 
the main procedure as being empty so that it can he reused. 

This method of stack allocation presented problems 
when the code size grew significantly (to 1,1 Mbytes), but 
the RAM grew only slightly (to 64K bytes). Some processes 
became larger and the restriction on waits became un- 
wieldy. Therefore, a second method was implemented of 
allocating entirely separate stacks (partitions) for some pro- 
cesses (e.g., calibration and HP-IB control). There was not 
enough RAM to do the same for all processes, so a hybrid 
system was maintained. A partition concept was defined 
to allow a single process to use a partition [allocated at 
power-up) or one of a group of processes to use a partition 
at any one time (e.g., one of the HP-IB processes). The 
addition of partitions allowed us to gain the maximum 
utility out of the limited RAM available. 

The operating system is semaphore based. The classic 
Dijkstra p() and v() operations J are used for resource protec- 
tion and process communication. These operations have 
been implemented as counting semaphores with the only 
allowable operations of wait (corresponds to p()) and signal 
[i.e., v()). A counting semaphore can be defined as a positive 
integer [may never become negative), 

A signal is defined as an increment operation. A wait is 
defined as a decrement operation where the result may not 
be negative, If the decrement would cause the result to be 
negative, then the process execution is blocked until the 
semaphore is incremented (i.e.. a signal is sent by some 
other process), so that the decrement can be successful. 

The actual implementation of a semaphore is different. 
The integer does become negative on a wail and the process 
is blocked in a queue of waiting processes associated with 
each semaphore (semaphore wait queue). A signal where 
the previous value before the increment was negative will 
unblock the first process in the wait queue I i.e.. it can run 
again). In this sense, a semaphore count, ii negative, repre- 
sents the number of processes waiting on the semaphore. 
There axe generally two types of semaphore use: resource 
protection and process communication. A resource protec- 



tion semaphore is initialized with a value that is the number 
of resources available (generally one). An example of this 
in the HP 3562 A is the floating-point processor; only one 
process can use it at a time. A process uses a resource 
snmaphore by executing a wail before the block of code 
that uses the resource, and a signal after that block of code. 
All processes that use this resource must do the same thing. 
The first process executing the wait decrements the 
semaphore from 1 to before using the resource and after- 
ward increments (signals) the semaphore back to 1, If a 
second process tries to wait on the resource after the first 
process has acquired the resource (wait), then the second 
process will he blocked (it cannot decrement to -1) until 
the first process has signaled (released) the resource. A 
semaphore is also used to protect critical sections of code 
from being executed by two processes simultaneously. 

A process communication semaphore (initialized to 0) 
is used by two processes that have a producer/consumer 
relationship. The producer signals the semaphore and the 
consumer waits on the semaphore . An example of a con- 
sumer in the HP 3562A is the marker process, which waits 
for a marker change before moving the marker on screen, 
while the producer is the keyboard interrupt routine, which 
detects that a marker change is requested and signals the 
marker process* 

In many cases, a process communication semaphore is 
not enough, The consumer must know more about w r hal is 
to be done. Therefore, a communication queue which has 
one entry for each signal by the producer is associated with 
the semaphore. This combination is known as a message 
queue. An example is the keyboard interrupt routine (pro- 
ducer) and the front-panel process (consumer]. A queue 
containing key codes is associated with the semaphore- 
Queues, a first- in-first-out construct, are also supplied 
by the opera ling system, As mentioned above, queues are 
mainly used to communicate betw r een processes, hul are 
also used by the operating system (he, T the process ready 
queue) and other processes to keep a list of information. 
Another variant of the queue definition is the priori!} 
queue, where an addition-to-a-queue operation inserts new 
entries hi order by a priority key. The opera! ing system 
process ready queue is an example of a priority queue in 
the IIP 3562 A, The priority is set by the scheduling process 
to determine the relative execution order of several processes. 
Command input to the HP 35G2A is from three processes 
representing the keyboard, autosequence, and HP-IB. Each 
process accepts codes from the appropriate source, con- 
verts them to a common command function format, and 
invokes the command translator. The command translator 
controls the main user interface, the softkey menus, com- 
mand echo field, nu merit: enlry, and translation of the com- 
mand function into a subroutine call to do the requested 
operation. The command translator is table driven. A single 
data base of all keys is maintained with all related softkey 
menus and actions to be performed when a key is pressed. 
A single command translator presents a consistent interface 
to the user, regardless of the source of the commands. 

The following is an example of the power of a multitask- 
ing operating system. It shows how four processes can be 
run at the same time within the HP 3562A. There is a 
measurement running on one trace, a view input on another 
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trace, the display process itself, and a plot process 
the results, press: 



l To see 



PRESET RESET 

UPPER LOWER 

B VIEW INPUT INPUT TIME 1 

PLOT START PLOT 



(put HP 3562 A in known 
state and start measurement) 
[display both traces) 
(start view input in other trace) 
(start plot process) 



The plotting is performed in parallel with the measure- 
ment and view input. This is called "plot on the fly." 
Resource semaphores are used to protect hardware from 
simultaneous use by several processes. Communication 
semaphores are used to communicate display status to data 
taking processes. Buffer lock flags are used to protect the 
display buffers from use by both the display and the plot. 

In summary, the use and implementation of the operating 
system in the HP 3562A show the trade-offs that occurred 
during project development. The operating system has 
enough power and flexibility to implement otherwise very 
difficult operations [e.g., plotting during a measurement], 
but is simple enough not to impose much overhead. Less 
than 2% of the processor time is spent in the operating 
svstem. 



action could take place (zero start versus stop frequency I 
This is not what the user expects. To solve this problem, 
any time a variant softkey menu is displayed because of a 
key press, the variant menu number is stored in the auto- 
sequence following the key. When the autosequence is 
edited, the variant menu is forcibly displayed so that the 
user always sees the same command echo regardless of the 
measurement mode, When the autosequence is executpd, 
thf 1 actual key function (e.g.. slop frequency] is obtained 
by looking up the softkey number in the current so: 
menu. If the key function is found, the new key code is 
executed instead of the old key code. This allows softkey 
functions to move in the menu in different modes. If the 
key function is not found, an error results and the autose- 
quence stops. The additional overhead for this function is 
one byte for each variant menu displayed in the autose- 
quence. Although the storage overhead is not very signifi- 
cant, the software to execute the autosequence became 
more complex. 

While executing the autosequence, the front- panel keys 
are locked out (except AUTOSEQ PAUSE ASEO). This is neces- 
sary because there Is only one base translator with three 
command sources (keyboard, autosequence, and HP-IB). 
and only one current menu can be remembered at a time. 



Autosequence 

An autosequence is a programmable sequence of key 
presses thai can be invoked by a single key, which makes 
repetitive operations easier for the user. It is the simple 
programming language buill into the HP 3562A. In the 
learning mode (edit autosequence), all completely entered 
command lines, from hard key to terminating key. <nv re- 
membered in the autosequence. The line-oriented editor 
allows line deletion, addition, and replacement. A looping 
primitive and a goto command allow repetitive commands, 
( Ither autosequences can be called as subrou lines. The au- 
tosequence caa In- labeled with two lines oj text, which 
will replace the key label that invokes the autosequence. 

Up to five autosequeiices in I he instrument can be stored 
in battery -backed CMOS RAM. Since CMOS RAM is limit- 
ed, an implementation was chosen that takes minimal 
memory to store an autosequence. The storage mechanism 
saves either the keycode (0 through 69] tor the key pressed 
(only takes one byte) instead of the HP-IB mnemonic [4 
bytes], or a unique key function token (over 900 functions 
defined by 2 bytes). This allows storage foi an autosequence 
of 20 lines with an average of 10 keys/line. The command 
strings seen in the autosequence edit mode are only stored 
in a temporary edit buffer (not stored in CMOS RAM] 
When the edit mode is engaged again, the stored keys are 
processed through the command translator without execut- 
ing the associated actions to rebuild the command strings. 
The commands are shown on the screen as they are being 
rebuilt. 

The softkey menus of the HP 3 562 A may vary when the 
measurement mode changes. For example, in swept sine 
mode, there \$ b softkey stop freq under the freq kev. 
In Linear resolution mode, the zero start softkey Is in the 
same location (softkey 4), kev codes (e-g-« softkey 4] are 
stored in the antosequence. If an autosequence that has 

this key sequence in it is run in both modes, a dill- reuf 
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Fig. 7, Help table structure (top), definitions of two-byte 
character values (mtcidle), and examples of token tabie 
tries (bottom) 
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For instance, from the keyboard, the user presses WINDOW 
UNIFRM (softkey 3} T and from autosequence, SELECT HEAS 
FREQ RESP [softkey 1]. If the key presses were interleaved 
between the two command sources when the functions are 
processed, then the com ma rids would appear to be WINDOW 
SELECT MEAS AUTO CQRRelation [softkey 3), The wrong 
functionality would be invoked in this case. To prevent 
this situation, these two processes are synchronized and 
the autosequence lets the HP-IB commands execute only 
between autosequence lines. Each autosequence line al- 
ways starts with a hard key [he,, it has no memory of a 
previous softkey menu]* 

Autosequence commands may spawn child processes to 
do specific actions [e.g., a measurement). The front panel 
will continue to accept a new command before the child 
process is finished, but the autosequence will wait until 
the child process is finished before executing the next line. 
The required synchronization is provided by the operating 
System, which keeps track of the status of the children of 
the command front ends and communicates the informa- 
tion through a semaphore to the command source (e.g,, 
autosequence). 

Help Text Compression 

The HP ;i5B2A has a help feature for 652 soft and hard 
keys in the instrument. Help information is derived from 
the HP-IB mnemonic for I lie key pressed, The help display 
consists of information derived from the key table [full 
conniinnd name, type of key, number and type of parame- 
ters) and a text description of the key. The text description 
can take the entire display area (48 columns by 2D lines j. 
If the average text description per key is five lines, then 
just the text portion of the help feature would consume 
156 T 000 bytes. The actual help text for the HP 3 562 A takes 
157,125 characters (bytes). 

To save ROM Space, the text is compressed before putting 
it in ROM (see Fig. 7). The method used is a token replace- 
ment of duplicate words used in the text. The two-pass 
compression program reads the input text on the firsl pass, 
breaks it into words, and updates an occurrence count lor 
each unique word [29,5119 unique words). At the end of 
the first pass t the words are sorted by their occurrence 
frequency. Token tables are created for multiple-use words 
to he replaced in the text during the second pass. 

The ASCII character set (unexpended) is ;i seven-bit code. 
To encode the tokens in the output text, the high-order bit 
[bit 7) is set in the byte. This allows 128 tokens to be 
encoded in a byte, which is not enough. A word token 
allows more tokens (32,767), but takes twice the memory 
to represent the output text. To compromise, both byte 
tokens and word tokens are used. The 64 most-used words 
are encoded into a byte token (bit 6 set) and 32 t 767 other 
text functions can be encoded into a word token. 

The word text functions are split into two groups: word 
tokens and special functions. Special functions have bits 
4 and 5 set, allowing 4095 functions to be encoded and 
leaving 12.288 possible word tokens (1548 actually used). 
The special functions are split into two groups; special 
formatting commands and text macros. The text expander 
in I he HP 3582A also formats the displayed text. The text 
in the help table is in free format, just words separated by 



spaces and punctuation. The expander knows how to left- 
justify text on the screen and break lines on word bound- 
aries. In addition, it knows two special formatting com- 
mauds: break line (go to next Hne) and start paragraph. The 
last of the special functions that are encoded are text mac- 
ros. Groups of words can be defined and included in mul- 
tiple places [useful to describe numbers). There is space 
for up to 4094 text macros. The 64 most- used byte tokens 
save almost as much space as the 1548 word tokens (45,189 
bytes versus 55^63 bytes). 

To save as much ROM space as possible in the help table t 
several concessions to size, space, and speed were made. 
To make the pointers to text shorter, 16-bit unsigned offsets 
are used instead of 32-bit pointers in the help mnemonic 
table (converts HP-IB mnemonics to text addresses), the 
text macro table, and the word token index table, Later, 
the help mnemonic table had to be changed to 32-bit poin- 
lers because the size of the tables and the help text had 
exceeded 64K bytes (the limit of a 16-bit offset). Since 
words vary in size, the byte and w T ord token tables are 
organized as a linked List. Hence, they must be traversed 
to access any given token. This is acceptable for the t>4 
byte tokens, but takes loo long for the 1548 word tokens. 
Therefore, a word token index table was created to index 
into every 64 word tokens, keeping the worst-case lookup 
sequence to 64 entries. 

Compressing text to save room can be a good investment. 
In the HP 3SH2A, the text is compressed down lo 3fi% of 
its original size through word replacement, saving 100,000 
bytes of ROM 

Compression may be a good idea, but how long does it 
take to compress and decompress the text? The compres- 
sion into assembler code takes approximately six hours on 
an HP 1000 Computer. However, the decompression of a 
worst-case full page takes less than a second, so it is accept- 
able for a help display. 




Fig. 8. Automafh plot for group detay calculation. 
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Auto math 

The HP 3562A includes a rich set of math operations 
available to the user to postprocess the measurement data. 
In addition, automath adds the ability to perform calcula- 
tions on the fly during a measurement for custom measure- 
ment displays (e.g., group delay J. Automath also adds the 
ability to title a custom measurement display on the trace. 

In many cases the automath calculation does not slow 
down the measurement appreciable because most 

math operations (except log and FFT-type operations) are 
performed by the floating-point processor (FPP] in parallel 
with most operations in a measurement. 

Automath is in reality an autosequence that is limited 
to math, measurement display, and active trace keys. When 
the MEAS DISP AUTO MATH keys are pressed, the automath 
autosequence is run to produce a sequence of math opera- 
tions and the initial measurement displays are set. The 
measurement converts I he sequence of math operations 
into a stack of FPP commands and appends it to its own 
FPP stack. This stack is retained for use on every measure- 
ment average; the stack is only rebuilt if something changes 
(e.g., a reference waveform is recalled), Considerable time 
is saved in this optimal case. For operations that the FPP 
does not execute directly* there is a pseudo FPP command 
that causes the FPP to stop and Interrupt the CPU, which 
interprets and executes the command and then restarts the 
FPP upon completion. Since the math commands are gen- 
erated once, a copy of the changed state of the data header 
must be saved during generation to be restored on every 
data average while the FPP computes the automath. 

Fig. 8 shows the resultant traces generated by automath 
for group delay where group delay = -Aphase/Mrequent v 
Math is applied to the complex data block, not the coor- 
dinates that are displayed (e>g, t magnitude and phase], so 
the phase of the frequem j response musl lie expressed in 
Complex form. The natural log of the complex data will 
put the phase informal iim jh the imaginary part of the data. 
Multiplying by + j1 resnils in negative phase being in the 
real part of the data. The real-part command clears the 
imaginary part of the data, resulting in negative-phase in- 
formation only, fhp differentiate command completes &e 
group delay by differen hating with respect to ireunency. 
Fig. 9 shows the phase of the frequency response measure- 
ment used by automath for the plot in Fig; H 
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Autocalibration 

The HP 3562A calibration software has two parts. First, 
a calibration measurement routine calculates complex- 
valued correction data for the two input channels. These 
calibrations can run automatically to account for tempera- 
ture-induced drift in the input circuits, including the 
power-up temperature transient. Second, correction curves 
(one for each display trace) are generated on demand for 
use by the measurement software. These correction curves 
combine the input channel correction data with analytic 
response curves for the digital filters to provide complete 
calibration of the displayed measurement results. 

The HP 3 562 A input channels have Four programmable 
attenuators. 1 The first, or primary, attenuator pads (0 dB t 
-20 dB, and -40 dB) each have significantly different 
frequency response characteristics. As a result, separate 
calibration data is required for each primary pad* Since 
the HI 1 ;r>'»2.\ measurement software can perform both 
single-channel and cross-channel measurements, it may 
demand any of 15 distinct correction curve formats (see 
Fig. 10]. Rather than collect 15 sets of data during calibra- 
tion, six sets are carefully chosen so that the remaining 
nine sets can be derived. 

The first choice for these basic six sets would he the 
single-channel calibrations A (MM A.,,,, A 4(l > B aa » B £0 , and B w 
(the subscripts refer to a particular pad of the primary at- 
tenuator). The HP 3B62A single-channel calibration hard- 
ware and techniques are adapted from the HP 3561 A. 4 Two 
classes of errors can be identified in this technique, One 
class of errors has identical effects in both channels, and 
therefore has no effect on the accuracy of a cross-channel 
correction derived from single-channel corrections- Finite- 
resolution trigger phase correction is an example of such 
an error term. The other class of errors has different effects 
in each channel, and therefore wiU degrade the accuracy 
of derived cross-channel corrections^ Slew-rate limiting of 
the pseudorandom noise calibration signal by the input 
channel amplifiers is an error of the second class. While 
the cumulative errors in this second class are small with 
respect to the single-channel phase specification (±12 de- 
grees at 100 kll/.l. limy are too large to provide state-of-the- 
art channel-lo-channel match performance, 

Therefore, the basic six calibration data sets must allow 
cross-channel corrections to be derived with no contribu- 
tion from single-channel terms. The highlighted elements 
of Fig Hi identify the basic six sets. A typical relationship 
between derived find basic erosschanmd corrections is: 

H-ii,- 'A.,., |H.4u •' A^i||B t , f j/A|)„|/[B { | /A 2 a| 

vvhtre Hie Square brai kets em lose stored terms. 

Single-channel correction data is derived from single- 
channel and cross-channel terms. Fur example: 



H. 



[AooHB^Aw] 



Fig. 9. Plot of phase for frequency response tn Fig 8 



where IJ HI A MI is derived as shown above, 

The fivr basic i mss-channel corrections are, like the 
single-channel cortei lion, measured during the calibral ion 
routine, Tim periodii chirp source is connected internally 
to both channels for a harr-wire Erequency response mea- 
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surement. a direct measure of input channel mateh, The 
periodic chirp provides even distribution of energy &yM 
the whoie frequency span of interest. It is also synchronized 
with the sample block; duration to eliminate leakage effects* 

Now consider the corrections for the second, third, fimt 
fourth input attenuators (secondary pads]. The primary pad 
correction data already contains corrections for the particu- 
lar secondary pads engaged during primary pad calibration. 
Thus, secondary pad correction must account for changes 
in the input channel frequency response when differeul 
secondary pads are engaged. These relative secondary pad 
corrections are negligible when compared w r ith the single- 
channel specification. However, their effects are significant 
for channeUto-channel match performance. 

A pair of ha re- wire channel-to-chaimel match measure- 
ments is used to generate relative calibration data for the 
secondary pads. One channehto-channel match measure- 
ment is used as a reference — the absolute correction for its 
secondary pad configuration is contained within the pri- 
mary pad data. In the other channel -to-channel match mea- 
surement, one secondary pad is switched in one channel. 
The ratio of these two channehto-channel match measure- 
ments is the relative correction for the switched pad with 
respect to the reference pad. For example, consider a refer- 
ence configuration of (J-dH primary pad (subscript 00] and 
0-dB. n-iifi, 0-dB secondary pads | subscript 0,0,0) in both 
channels. Increment the last attenuator in Channel 2 to 2 
dB for a configuration of 00-0, 0\2. Then the relative correc- 
tion for the -2-dtf pad in the last attenuator of Channel 2 
is calculated from; 



( "(HMl.ll.il' AfJQ-O.O.IjJ'lti 



fUUl'^OO-U.. 



I : "00-0,0,2' "n 



For an arbitrary secondary pad configuration, I he relative 
corrections tor each attenuator are combined. The relative 
secondary pad corrections can he modeled over the 0-to- 
100- kHz frequency span as a magnitude offset and a phase 
ramp. With this simplification, secondary pad calibration 
data can be measured at one frequency using the HP 
H562A's built-in sine source. The combined secondary pad 
correction is then a magnitude-offset/phase-ramp adjust- 
ment to the primary pad correction, The resultant correc- 
tion curve produces channel-to-channel match accuracy of 
±0.1 dB and ±0.5 degree. 
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Fig* 10. Single-channel and cross-channel primary- at- 
tenuator pad correction curve variations. B 40 represents the 
reciprocal of (he frequency response of Channel 2 with the 
-40-dB pad engaged. B 4Of 'A 20 represents the correction 
curve for a cross-channel measurement with primary at- 
tenuator pads specified for each channei (40 dB in Channel 
2, 20 dB in Channel 1). The six shaded values are stored in 
the HP 3562 A and the other nine vaiues are derived from the 
stored values. 
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Measurement Modes and Digital 
Demodulation for a Low-Frequency 
Analyzer 

by Raymond C. Blackham, James A. VasiL Edward S. Atkinson, and Ronald W. Potter 



THE HP 3562A DYNAMIC SIGNAL ANALYZER pro- 
vides three different measurement modes for low- 
frequency spectrum and network analysis from 64 
fiHz to 100 kHz within one instrument with two input 
channels and a dynamic range of 80 dB; 

■ Swept sine 

■ Logarithmic resolution 

■ FFT-based linear resolution. 

These measurement modes use advanced digUal signal pro- 
cessing algorithms to provide more accurate and more re- 
p eatable measurements than previously available with con- 
ventional analog circuit approaches. 

Swept Sine Mode 

The swept sine measurement is based on the Fourier 
series representation of a periodic signal* Consider a 
stimulus signal: 



m = 2 Cne M " 



applied to the device under test (DUT) and a response 
signal: 



r[t] = % d n e'""" 



where Cj and d { are complex numbers, and c s = c t and 
d t = d", so that s[ l] and r[t) are real signals. Ideally. s(t) is 
,i perfei f sine wave, with perhaps a di uiJsel, nit,, li.r 
n>l. The frequency response of the DUT at frequency f =? 
oj/2t7 Hz is defined to be (1,/c,. The swept sine measurement 
is a series of measurements of d : and c, for different fre- 
quencies, 

The HP 3562 A assumes that s[t) is connected to its Chan- 
nel 1 input and r[t] is connected to its Channel 2 input. c t 
is calculated using the standard Fourier series integral: 






s|t]e M dt, 



where T = 2ir/(u = 1/f. d 3 is calculated in the same way 
uMtiN ill) in place of s(t). 

The HP 3562A carries out this calculation on both input 
signals as follows (given for s(t) only): 
1. The signal is sampled at intervals of T N = l/f s , with f^ 
= the sample frequency- Let s„ = s{nTJforn = P 1,2.... 



2, s,, is multiplied by e >um in real time using a digital local 
oscillator. 

3, Samples spanning the time (N + 19)T S . where N is a 
positive integer t are used in a numerical integration al- 
gorithm to calculate an approximation to: 



MT I 



MT 



s(t]e M dt 



where M is a positive integer. MT*£ [user- entered integra- 
tion time)<M(T + l), and (N-1JT^MT<NT S . 

4. IcJ 2 . Idjl 2 , and d^c* (the trispectrum) are calculated. 

5. Steps 3 and 4 are repeated the number of times specified 
by the user-entered number of averages. 

6> The values of \t t \ thus calculated are averaged together 
and stored, The same is d one for |d J 'and d]C* [trispectrum 
average | 
integration Algorithm. Since: 

fV'»-i"di= i T forn = 1 

J a 4,0 for n ? 1 

we can simplify: 

IjT s(t)e M dt = ± f e M f £ c,,^ 1 1 dt 

4S % fV 1 " 1! 'dt 



and we ohtain: 






e ]tut s(t)dt = c r 



as desired, 

The integral can be evaluated separately for the real and 
Imaginary parts. However , if the integral is not evaluated 
exactly, the terms for which a#1 will not cancel completely 
and will contribute an error to the estimate of c, , This error 
could be particularly severe if there is a large dc offset or 
significant harmonic components in s(t). The integration 
algorithm is designed to guarantee that any contribution 
from a component q of s(t) for t^l to the integral estimate 
is a factor of 3X10"* [90 dBj less than cj. For example, if 



I and 



K), the resulting estimate of c x would be 
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affected by the c term by a factor of no mure than 
10/(3 xio 4 ] ^3x10 4 T or -70 dR 

The numerical integration is performed using a fifth- 
order composite quadrature integration formula as follows. 
Consider six adjacent samples of s n e~* nwT \ Find the fifth- 
order polynomial passing through these six points, Inte- 
grate this polynomial between the middle two points. Now 
move everything forward, in time, one sample (overlapping 
five samples) and repeat the procedure (see Fig. l), 

The interval over which this polynomial is integrated is 
contiguous with the previous integration interval. The sum 
of these two integrals is an approximation, of the integral 
of the continuous signal s(t)e i5lH over the two sample 
periods, Continuing, we can cover as large an interval as 
desired that is a multiple of the sample period T s . This 
integration method is similar to Simpson's method of nu- 
merical integration (also a composite quadrature formula), 
which integrates a second-order polynomial passing 
through three adjacent samples over two sample periods 
as shown in Fig. 2. Note that the fifth-order formula used 
in the HP 3562A uses two points beyond each end of the 
integration interval. 

Now, to allow integration over an arbitrary interval rather 
than a multiple of the sample period T„. a separate quadra- 
ture integration formula is calculated to integrate over a 
portion of a sample period at the end of the integration 
interval, as shown in Fig. 3. This last quadrature formula 
depends on the value of MT. the interval of integration of 
the Fourier series integral, which, in turn, depends on the 
frequency of the signal component we are trying to esti- 
mate. Therefore, this formula must be recalculated for each 
point in the frequency sweep of the measurement. Fortu- 
nately, it is a relatively simple calculation. For more infor- 
mation, many numerical analysis texts such as reference 
1 are available containing sections on general composite 
quadrature integration methods. 

This numerical integration method, when applied to a 
signal of the form a"" 1 * 7 *, loses accuracy as the frequency 
a) increases. For example, using this method to estimate 
the value of; 



J* ; 




B Jwt dt = 



Stti-Order 
Polynomial 




//%■: '.,,'#- ■ .. ^T-^U/yya 



Fig, 2. Second-order polynomial integration using Simpson's 
method, 

results in a value with magnitude less than 3 x10 s (90 
dH below the signal level | for frequencies such that there are 
at least ten samples per cycle, or for T s <0,l(2ir/<u). Hence, 
limiting the frequency components of the incoming signal 
to 0.1f s , where ^ - the sample frequency, would guarantee 
a sufficient amount of rejection of the terms in the Fourier 
expansion of s(t] for which n#l. If we used Simpson's rule 
instead, this limit would be considerably lower, 

To attenuate the errors caused by terms in sft) above this 
frequency limit, the signal s^e"' 1 " 11 is passed through a short 
low-pass FIR (finite impulse response) filler with a gain of 
1.0 at dc. This FIR filter is of the same form as the quadrature 
integration formulas, and can be combined with them as 
tf they were two consecutive FIR filters. The result is a 
modified composite quadrature integration formula which 
incorporates the attenuation characteristics of both the 
fifth-order composite quadrature formula and the FIR filter, 
and therefore, will attenuate contributions of terms in the 
expansion of s(t) by at least 90 dB fur all such terms between 
dc and f H /2. The form this formula takes is a dot product 
between samples of the input signal and a vector of coeffi- 
cients as illustrated in Fig, 4. 

The number of coefficients in the leading and ending 
portions of the coefficient vector is the same fur all inte- 
gration intervals MT. The number of coefficients of value 
1 in the center of the coefficient vector increases with MT. 
In the HP 3 562 A r there are 18 leading and 19 ending coef- 
ficients. Because of the separate quadrature formula for the 
last (partial) interval, the coefficients are not symmetric 
(e.g.. the ending coefficients are not a reversed version of 
the leading coefficients). 

Signal components not harmonically related to the mea- 
surement frequency (f = I/T) will not. in general, be at- 
tenuated by as much as 70 dB, Fig, 5 represents the relation- 
ship between attenuation of a frequency component of a 
signal and frequency. To attenuate such nonharmonically 
related signal components, the integration time MT can be 
increased by increasing M. 

Errors in the estimation of c f can also be produced by a 




Fig. 1 . Fifth -order polynomial integration 



Fig. 3. integrating over a portion of a sample period. 
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less than perfect sequence for e _fcTltoT \ say, e-* nt * T * + E n , 
where E n is an error sequence. If s(t) has a significant dc 
component s * so that s(t} = s + s, cos(ti*t ■+■ <£), and E f[ 
has a significant dc component E^ then the term EqS will 
appear in the Fourier series integral and not cancel out. In 
the HP 3562A. the local oscillator signal e~** m + E n has 
^ ^ 3 x 10~ 4 , hence this is not a significant problem, 

Similarly, a stimulus signal with significant harmonic 
components (and subharmonic components in the case of 
a signal generated digitally and passed through a digital-to- 
analog converter) can cause harmonic and intermo duration 
distortion products from a nonlinear DUT to add spurious 
signal components at the measurement frequency, which 
will contribute directly to the error in estimating c^ for a 
pure stimulus signal. This problem is addressed by the 
signal source design of the HP 3 5 62 A, which has no har- 
monic or subharmonic signal terms greater than 60 dB 
below the fundamental frequency component. 
Sweep, The HP 3562 A uses a stepped sine sweep rather 
than a continuous sweep. The instrument determines the 
frequency to be measured and sweeps the stimulus source 
frequency to this determined value. The HP 3 562 A then 
waits for transients to settle (usually 20% of the chosen 
integration time) and then begins the measurement at that 
frequency point. 

The frequency resolution of a sweep (selected by the 
user) is the difference between adjacent frequencies in the 
measurement Since the frequencies axe at discrete points, 
important frequency response characteristics between 
these points might be missed. For this reason, the HP 3562A 
has an autoresolution mode. In this mode, the instrument 
automatically takes Finer frequency steps where necessary 
so that the user can specify a coarse resolution to speed 
up the measurement sweep without missing important in- 
formation about the frequency response. The HP 3562A 
determines the frequency spacing based on the magnitude 
of Ihe (complex) difference between the frequency response 
estimates of the last two measurements relative to the 
geometric average of the magnitudes of the two frequency 
responses- A large relative difference indicates a significant 
change in magnitude or phase of the frequency response 
over this frequency interval, and so the instrument de- 
creases the frequency step to the next measurement point 
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Fig, 5, Attenuation of signal components versus frequency 

After taking a measurement at a particular frequency point, 
if the instrument determines it has stepped too far, it backs 
up and takes the measurement again at a frequency closer 
to the previous measurement frequency. 
Autointegrate. The nonhartnonic noise rejection of the 
Fourier series integration algorithm depends on the inte- 
gration time selected by the user. For measurements in 
which the signal-to-noise ratio is not constant over the 
frequency range, the user must select an integration time 
that takes the lowest signal-to-noise ratio into account. Be- 
cause integration over a period this long is not required 
over the entire frequency span, time will be wasted if this 
integration time is used at each frequency point, The HP 
3 5 62 A has an automatic integration feature that decides 
how long to integrate (up to the user-entered integration 
time) at each measurement frequency based on the effect 
the noise has on the measured transfer function magnitude 
at that Frequency* 

This is accomplished as follows. A small amount of data 
is taken and the transfer function estimate is computed. 
The same amount of data is taken again and the transfer 
function is again computed. The two data segments are 
contiguous; no data is missed, This is done three times, 
and the normalized variance of the magnitude of the trans- 
fer function is estimated. From I his, the variance is esti- 
mated for the error on the transfer function magnitude that 
would result if it were calculated using a single Fourier 
series integral over all the data. If this estimate is large) 
than a user-entered level, another contiguous block of data 
is taken and the procedure is repeated. Otherwise, the result 
pf I he single Fourier series integral over all the data is Ui>ed 
I i i .i leu late the transfer function estimate. 
Autogain. For nonlinear devices, the transfer function will 
be a function of the level of the input to the device. It may 
be important to control the level of the signal at either the 
input or the output of the DUT. The HP 3562A 1 s automatic 
gain feature allows this by adjusting the signal source level 
such that the desired reference signal level is achieved. 
The HP 3562 A first estimates what the correct source level 
would be based on the result at the previous measurement 
frequency- The source is set at this level and a measurement 
is taken. If the reference signal is estimated tn he within 
1 dB of the desired level, the HP 3562A goes on to the next 
measurement frequency. If not, the HP 3562A adjusts the 
source level, waits lor transients to die out. and repeats 
the measurement, The possibility of an infinite loop is 
eliminated by the algorithm, which quits trying to adjust 
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the source level and goes on to the next frequency point 
if the algorithm tries to move the source signal back toward 
a previous level. 

Log Resolution Mode 

The logarithmic resolution mode of the HP 3 5 62 A makes 
frequency -domain measurements at logarithmically Spaced 
frequency points (lines) instead of the linearly spaced 
points measured by a standard linear resolution FFT mea- 
surement. This results in more measurements at the lower 
frequencies than at the higher frequencies (as shown in 
Fig. 6a), which is exactly what we want when trying to 
understand the response of a system over a large span, 
When the measured points are plotted on a log frequency 
axis, they are spaced linearly over the entire measurement 
span (see Fig, 6b). Hence, a single measurement provides 
an excellent understanding of the response of a system 
over a wide frequency range (up to five decades in the case 
of the HP 35b2A)> This Bode-plot type of measurement is 
especially useful when measuring electronic filters and 
dynamic control systems. 

Measurement Points, Another way to say that the log res- 
olution lines are logarithmically spaced is to say that they 
are proportionally spaced. That is h the ratio between the 
locations of any two adjacent lines is a constant. If we call 
this constant k, we can relate the locations of adjacent lines 
by: 

f L [m + l) = kfjmj 

Applying this equation recursively yields: 

f [: {m + n] = k n Um) 

Since the log resolution mode of the HP 3562A provides 
a fixed resolution of 80 points per decade, we know that 
f c (m + 80) = k flC fJm)andthatf c (m + 80) - lfjf i:; (m). Taken 
together, these equations state: 

f t: (m + n) = 10 n/8O UmJ 

If we define the center frequency of the first measured 
point to be t $t (start frequency) and let m = at this fre- 
quency, then we see thai the nth line is defined by: 



be halfway between f f (n) and f c (n + 1 j and the Lower band 
edge of the nth line to be hallway between fjn] and i L {n-l) 
when plotted on a log frequency axis, we can define the 
bandwidth of line n to be: 

t, nv [n) = f up p ei .(n] - f[ G war(n] 

Since the halfway point on a log axis is the geometric 
mean on a linear axis, we have; 

t hw (n) - VUn)xfjn + l] - Vf c (n-l)xyn) 

Substituting the center frequency definition derived 
above (Equation 1) yields: 



f hw (n) - UnJxflO 1 



10" 



r '] = 0.0288f,.(n) 



fjn) = 10" flll f st 



(1) 



If we now define the upper band edge of the nth line to 



which shows that bandwidth is a percentage of the center 

frequency. 

Single-Decade Measurements. The difference between a 

log swept sine measurement and a log resolution measure- 

ment is thai while the former is computed a point at a time, 

the latter uses the FFT to compute a decade at a time. 

The first step in making a single-decade log resolution 
measurement is to produce a linear spectrum for each chan- 
nel, To do this, a 2048-point time block is taken with the 
low-pass filters set to a span equal to the stop frequency 
of I he desired measurement. A Harming window is then 
applied to the time domain data and two FFTs are per- 
formed, yielding the linear spectra: S x and S v . 

Now three results are computed: Channel 1 power spec- 
trum G^ Channel 2 power spectrum G vyi and the cross- 
power spectrum G vx . Averaging the cross -power spectrum 
rather than computing the frequency response directly is 
called tri spectral averaging and is described in appendix 
A of reference 2, 

For each log resolution line within the decade, the linear 
resolution measurements between f loWHr (n) and f up p Er (n) are 
added together. The result of these summations is the log 
resolution measurement for this line. Fig. 7 shows the shape 
of a representative log resolution line and it can he seen 
that there is no perceptible ripple in the bandpass area. 

The summing process is repealed for each of the 80 log 
resolution lines. Since the linear resolution points being 
summed have a constant spacing, the number of points 
used to form each line will increase as the line center 
frequency increases. Since the roll-off of the line shape is 
related to the Hanning window 7 that was applied to the 
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Fig. 6* Log and linear resolution 
modes. Black arrows indicate 
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points, colored arrows indicate log 
resolution measurement points. 
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linear resolution points, the roil-off becomes steeper [with 
respect to log fj for lines at higher frequencies, 
Mult idee ade Measurements. When a span of more than 
one decade is selected, the above procedure is applied once 
per decade. In the multidecade case there is one important 
difference , Each channel of the HP 3 562 A contains two 
digital filters (required for filtering the complex time data 
produced by zoomed linear resolution measurements), and 
the digital filter controller hardware is intelligent enough 
to allow each filter to be set independently. Therefore, 
measurements on two decades can be made in parallel. 
The collection of data in log resolution mode is always 
done on a baseband span (i.e*, it has a G~Hz start frequency), 
In the multidecade case, the data for the first decade (lowest 
frequency) is processed by one filter and the data for the 
remaining decades Is processed sequentially by the other 
filter. In add it ion to the paral lei data col lection and process- 
ing, overlap processing can also be done on the first decade 
(whi< fa has the longest lime record) to speed up the mea- 
surement even more. 

Multidecade log resolution measurements should be used 
only for measuring stationary signals, because the data rec- 
ords for the various decades are not collected simultane- 
ously. This is also the reason for disallowing triggered mea- 
surements in log resolution mode. 

Power Spectrum A* curacy. It was pointed out. above that 
each log resolution line is formed by combining an integer 
number of linear resolution points. It should be apparent 
that this causes the actual bandwidth of each line to vary 
from the ideal bandwidth for that line. Frequency response 
measurements are not affected by this deviation since the 
amount of deviation is the same in each channel. However, 
it is noticeable when a pow r er spectrum measurement of a 
broadband signal is made. The amplitude of the resulting 
spectrum [compared to the ideal result) would he too high 
at lines where the bandwidth is greater than ideal (because 
the hue measures | er a Wider hand than ideal) and 

too low at lines where the bandwidth is less than ideal. 

To resolve this problem, the power spectrum is "cor- 
rected." This is rloae by first dividing the measured power 
nl t'..\\ \] line by the actual bandwidth of that lirte (thus 
producing the power spectral density of the line) and then 
multiplying by the ideal bandwidth of that line. Although 
this technique works well when measuring broadband sig- 
nals, it does mean that log resolution measurements will 




Fig. 7. Representative log resolution line shape 



not yield accurate results when measuring a narrowband 
(e.g.. fixed sine) signal The error in this case can be as 
much as + 1.7 dB, -2.3 dB above the basic measurement 
accuracy and Banning window uncertainty. 

All of this implies that when using log resolution mode, 
the user should keep its purpose in mind: to provide accu- 
rate measurements on stationary broadband signals over a 
wide span. 

Linear Resolution Mode 

In its FFT-based linear resolution mode, the HP 3562A 
can perform a broad range of frequency* time, and 
amplitude domain measurements: 

m Frequency: linear spectrum, power spectrum, cross- 
spectrum, frequency response, and coherence measure- 
ments 

■ Time: autocorrelation, crosscorrelation, and impulse re- 
sponse measurements 

■ Amplitude: histogram, probability density function 
(PDF) f and cumulative distribution function (CDF). 
The frequency domain measurements are made with 800 

spectral lines of resolution over the selected frequency span 
in either single-channel or dual-channel operation. The 
custom digital filtering hardware allows the selected fre- 
quency span to be as narrow as 10.24 mHz for baseband 
measurements, or 20,48 mHz for zoom measurements. The 
center frequency for zoom measurements can be set with 
a resolution of 64 /iHz anywhere within the instrument's 
overall frequency range. 

Averaging can best be described in terms of the quantity 
being averaged and the type of averaging being performed, 
if time averaging is selected, then the quantity being aver- 
aged is the linear spectrum of the input time record. The 
averaging is done in the frequency domain so that the cali- 
bration and trigger correction can he simple block- 
multiply operation. However, this is essentially the same 
as averaging the time records. Time averaging is useful for 
improving the signaJ-lo-noise ratio when an appropriate 
trigger signal is available. A time- averaged frequency re- 
sponse function is computed by simply dividing the aver- 
aged output linear spectrum by the averaged input linear 
Bpei l rum. 

If time averaging is not selected , then the quantity being 
averaged is the selected measurement function itself {e.g., 
power spectrum, autocorrelation, etc). If a frequency re- 
sponse measurement is selected, then a Irispectrum average 
is performed, where the averaged quantities are the input 
power spectrum G xv the output power spectrum G vv , and 
the toss spectrum G vv Power averaging is useful for reduc- 
ing the variance oj a spectral estimate. 

The type of averaging depends on the algorithm applied 
to the average quantity. Stable averaging is simply the sum 
of I he averaged quantities divided by the number of aver- 
ages (all data is weighted equally). Exponential averaging 
results in a moving average in which new data is weighted 
more llian previous data (useful for dynamically changing 
signals). For frequency domain measurements, an addi- 
tional type of averaging, peak hold, is provided. Peak hold 
results in a composite spectrum made up of the maximum 
vaiue that has occurred in each of the 800 spectral lines. 
This type is useful for detecting noise peaks, signal drill. 
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Demodulation Example 



Fig, 1 shows the instrumentation for measuring the distortion 
of an ordinary cone loudspeaker, in which both amplitude and 
phase distortion occur. The technique is to apply the sum of a 
JowMrequency voltage and a mrdrange-frequency voltage to the 
speaker, and then to demodulate the components that appear 
around the mid range tone. Here, the midrange signal becomes 
the earner, and the low-frequency tone generates the modulation 
sidebands 



In any speaker system of this type, the cone suspension acts 
like a nonlinear spring I hat tends to stiffen at the extremes of 

cone excursion This causes some degree of amplitude modula- 
tion of the midrange signal by the low-frequency signal. In addi- 
tion, because of the motion of Ihe cone, there will be a Doppler 
shift of the midrange frequency by the low-frequency component. 
This will show as frequency or phase modulation. Even though 
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Fig. 1. The measurement of 
loudspeaker intermoduiaiion dis- 
tortion. When the sum of a iow -fre- 
quency sinusoidal signal and a 
higher- frequency sinusoidal sig- 
nal is applied to a speaker, various 
nonlinearities cause modulation of 
the higher-frequency signal by the 
low -frequency signal Both phase 
and amplitude modulation will 
generally occur simultaneously. 
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Fig. 2. Power spectrum of microphone signal showing 20-Hz 

modulation sidebands around the 1200- Hz carrier. 



these two types of intermodulation distortion are caused by dif- 
ferent mechanisms, their waveforms are still very much related 
in time 

Fig. 2 shows the power spectrum of the speaker output signal 
centered around the midrange frequency, as measured by the 
microphone. Ffq. 3 shows the demodulated time waveforms for 
amplitude and phase using the midrange frequency as the car- 
rier. Finally, Fig. 4 shows the HP 3562A's demod-polar plot, in 
which both types of distortion are shown simultaneously. The 
origin for the carrier vector is at the left of the plot, and only the 
tip locus is shown. The carrier vector amplitude varies coherently 
with the phase modulation, even though these variations tend to 
be caused by totally different types of non linearity. 

Ronald W, Potter 

Consultant 



or maximum vibration at each frequency. 

The HP 3 562 A *s built-in source provides a variety of 
output signals to satisfy many different network measure- 
ment requirements. The random noise output is a broad- 
band, truly random signal. Since it is nonperiodic, it has 
continuous energy content across the measurement span 
and an appropriate window [e.g., a Hanning window] must 
be applied to reduce leakage. Averaging the random noise 
input and response signals reduces the effects of non- 
linearities in the network under test. This, in effect, 
linearizes the network's frequency response function. By 
using random noise as a stimulus 1 the response function 
calculated from a tri spectrum average (G^/G^] is a linear 
least-squares estimate of the network frequency response. 

The periodic chirp output is essentially a fast sine sweep 
wholly contained within the measurement time record. 
This output is also a broadband signal, but since it is 



periodic, leakage-free measurements can be made without 
windowing. Thus, a single, nonaveraged measurement 
using the periodic chirp signal gives accurate results in 
many linear applications. The periodic chirp also has a 
higher rms-to-peak ratio than stimuli using random noise. 

The HP 3562A also provides a burst random noise output 
that combines many of the benefits of the above source 
outputs. The random noise output is gated on for a user- 
selectable percentage of the measurement time record. If 
the network response signal decays to a negligible level 
within the time record, then a leakage-free measurement 
can be made without windowing. Since the signal is trun- 
cated random noise, it has continuous rather than discrete 
energy across the frequency span. 

The periodic chirp output can also be operated in a burst 
mode. All of these broadband signals are band-limited to 
the measurement frequency span, even for zoomed opera- 
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Fig. 3, (Top) Average of ten demodulated AM time 
waveforms. The modulation index is approximately 12% The 

endpoints are set to zero to show earner amplitude. (Bottom) 
Average of ten demodulated PM time waveforms showing 
approximately 50 degrees of peak-to-peak phase modulation 
The endpomts are set to zero phase 
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Fig. 4. Demod-polar plot Approximately three cycies of the 
focus of the tip of the modulated carrier vector are shown 
after ten demodulated time record averages, The dashed line 
shows the locus of constant carrier amplitude The marker 
on this line shows the zero phase point, corresponding to an 
unmodulated earner 



tion. In addition to these broadband i narrowband 

signal, fixed sine, is available for stimulating a network at 
a single frequency, 

A special source protect feature is provided for those 
applications where an abrupt change in excitation power 
level can cause damage to the device under test, When this 
feature is invoked, it provides two types of protection, P n irst. 
when the source level is changed, the output ramps to the 
new level instead of changing immediately. Second, if the 
source is on and the user changes any parameters that affect 
the source output (0*g*i center frequency, span, or source 
type), then the source output ramps down to zero vnli 

Within the HP 3562A, the demodulation process is per- 
formed digitally rather than by more conventional analog 
methods, This process (see following section) can be 
thought of as a measurement preprocessor in which the 
input is a modulated carrier and the output is a time record 
of only the modulating signals. The demodulated AM, FM, 
or PM time records can then be used as inputs to any of 
the measurements available in the linear resolution mode. 

For applications where thi- carrier frequency may not be 
known, there is an autocarrier mode in which the demod- 
ulation process ;n 1 1 nnia finally determines the carrier fre- 



quency required to demodulate the input signal correctly. 

Digital Demodulation 

In many applications, a signal spectrum will have a nar- 
row bandwidth, and the information of interest is carried 
by the modulation of that signal. A carrier can either be 
amplitude or phase modulated, and both types of modula- 
tion can coexist. Frequency modulation is essentially the 
same as phase modulation, with I he frequency-versus-liim* 
modulation waveform being the timn derivative of the equiv- 
alent phase- versus-ti me waveform. 

In general, the user would like to demodulate any such 
narrowband signals, and would like to separate amplitude 
modulation from phase or frequency modulation. In addi- 
tion, the amplitude and/or phase of the carrier may also be 
of interest. In the HP 3562A, either the modulation time 
waveforms or their equivalent frequency spectra can be 
obtained, and the two types of modulation can be displayed 
together in a polar plot to show any relationships that might 
exist between the two, 

Demodulation Equations, Any narrowband signal can be 
represented as a modi] I a ted carrier, and expressed in the 
following form: 
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x(t) = m(t)e K,t + m*(t)e" jwJ 

where the carrier angular frequency is qj in radians/second 
and m(t) is the complex modulating waveform given by: 

m(t) = A[l-Ha{t)]e^ m 

The carrier amplitude and phase are represented by the 
complex number A. while a(t) and <£(t) represent I he 
amplitude and phase modulating waveforms, respectively. 
The quantity m*(t) represents the complex conjugate of 
mft). Thus, x(t) is a real modulated waveform formed by 
the sum of a complex waveform and its complex conjugate. 

As long as the spectra for the two parts of x(t] do not 
overlap in the frequency domain, it is possible to separate 
the amplitude and phase modulation components unam- 
biguously. There are several ways to accomplish this, but 
one of the simplest ways is to construct n si nt^h -sided filter 
that only passes the positive frequency image. This is done 
in the HP 3562A by digitally shifting the positive image of 
the signal spectrum down to a frequency band near the 
origin, and then passing the resulting complex time 
waveform through a digital low-pass filter. 

Assuming that the negative frequency image has been 
completely rejected by the low-pass filter, the output time 
waveform will contain only the positive spectral image, 
and can be written as: 

y(t) = m(t)e p4j ■' 

= All+a(t)]e iiojJ4 cMm 

= u[tj + jv(t) 



there are a number of practical details that must be consid- 
ered in the implementation of these equations, 

The data flow block diagram for this process is shown 
in Fig. 8. The demodulation is introduced immediately 
after the digital antialiasing filter, and just before the vari- 
ous measurement options are selected. Thus, most of the 
normal types of measurements that can be made on input 
time records can also be made on demodulated time rec- 
ords. This includes ensemble averaging, transfer and coher- 
ence function calculations, correlation functions, histo- 
grams, and power spectra, The type of demodulation can 
be selected separately for each of the HP 3562A's two input 
channels, so it is possible to demodulate amplitude and 
phase (or frequency) simultaneously. 

If any parts of the original positive and negative fre- 
quency spectral images overlap, then it is not possible to 
separate the two modulating waveforms unambiguously. 
If demodulation is attempted in this case, the resulting 
waveforms will not be entirely correct. This overlap can 
occur about the frequency origin if the original modulation 
bandwidth is excessive, or it can occur around half the 
sampling frequency because of aliasing. 

Any spectral components in the band near the carrier 
are assumed to be modulation sidebands around that car- 
rier. If this is not the case, errors will be introduced into 
the demodulated waveforms, For example, if there is an 
additive spectral component at the ac line frequency, this 
component must be removed before demodulation is at- 
tempted, or there will be line frequency components in 
both the amplitude and phase waveforms. If dc or any 
carrier harmonics are present in the original spectrum, 
these must also be removed, or else they will appear as 



Here, u(t) and v(t) are real time waveforms, and are the real 
and imaginary parts of y(t), respectively. 

The magnitude of yft] gives the amplitude modulation 
waveform, while the phase of yft] gives the phase modula- 
tion, including the carrier ramp o> t. Hence: 



Vu(t) 2 + v(t) 2 - A[l +a(t]| and (2) 

tan _1 [v(t)/u(t)| = LA + o) n t + <ftt) (3) 

[f Equation 3 is differentiated with respect to time, the 
carrier phase disappears, giving: 



v(t) = uj + d<£/dt 



(4) 



The carrier frequency w^ is estimated by calculating the 
average value of Equation 4 weighted by a Harming window 
to reduce errors caused by leakage effects. When th is carrier 
term is removed, the remainder is the frequency modulat- 
ing waveform. This quantity is then integrated with respect 
to time, obtaining the phase modulating waveform <£(t) with 
all carrier components removed. A similar technique of 
weighted averaging is used in Equation 2 to define the 
carrier amplitude A so that the amplitude modulating 
waveform a(t) can be obtained. 

Digital Demodulation in Practice, Although the basic de~ 
modulation equations described above are straightforward, 
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modulation components at the carrier frequency. 

The HP 3562A has provisions for removing selected re- 
gions of the spectrum before the demodulation process. 
There is a preview mode, in which the original spectrum 
can be displayed before the demodulation step, A number 
of frequency regions can be selected, within which the 
spectrum is replaced by a straight Line segment connecting 
the end points of each region, In this manner, narrow bands 
nt am mating interference can be effectively removed 
from the data before demodulation is initiated. 

If the original signal passes through any filter whose 
passband is not constant with frequency, there will be some 
conversion of one type of modulation to the other. Thus, 
if is very important that the filter transfer characteristics 
be completely removed from the signal before demodula- 
tion is attempted. This is particularly important for any 
phase slope {or group delay variation) introduced by the 
filter, since most antialiasing filters have large group delay 
changes, especially near die tutoff frequency, even though 
the passband amplitude characteristic may be very flat, 
There is an automatic system calibration cycle in the HP 
35 62 A, during which tbe filter transfer function is mea- 
sured. It is then automatically removed from each record 



be data before demodulation is initiated. This keeps 
intermodulation conversion errors below a nominal level 
of about - 50 dB. 

Since it is possible to measure both amplitude and phase 
modulation waveforms simultaneously, there is a special 
display mode in which these two quantities can be shown 
together. A modulation process can be envisioned as a 
variation in the amplitude and /or phase of a carrier vector 
ihasor), Thus, the locus of the tip of this vector can be 
used to describe the modulation. The H: - demod- 

polar display mode shows this locus using rectangular 
coordinates. This allows any mutual relationships that 
might exist between amplitude and phase modulation 
waveforms to be shown > This can be very useful, for exam- 
ple, when the causes of various types of intermodulation 
distortion are under investigation, as illustrated by the 
example given in the box on page 22, 
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Analyzer Synthesizes Frequency 
Response of Linear Systems 



by James L Adcock 

A COMPLETE CAPABILITY for synthesizing the fre- 
quency response of linear systems based on their 
pole-zero, pole- residue, or polynomial model is in- 
cluded in the HP 3 562 A Signal Analyzer. This synthesis 
capability includes table conversion, the ability to convert 
automatically between the three models. The frequency 
responses can be frequency scaled and system time delays 
can be added, The designed system frequency responses 
are synthesized with exactly the same frequency points as 
those used by the corresponding HP 3562A measurement 
mode, Hence, the synthesized version of the desired fre- 
quency response can be directly compared to the measured 
response of the actual system. 

Pole-zero formulation corresponds to the series form of 
filter design most often used by electrical engineers. Pole- 
residue form is commonly used by mechanical engineers 
and in modal analysis, and corresponds to parallel design 
filters in electrical engineering. All three forms. pole-zero P 
pole-residue, and polynomial, find direct application in 
analysis and design of servo systems, Having all three syn- 
thesis forms available in the HP 3 562 A* users can select 
the formulation best-suited for their application- The HP 
3 5 62 A also facilitates the solution of mixed-domain prob- 
lems, such as electromechanical servo design, which re- 



quires a variety of electrical and mechanical design tech- 
niques- 

Synthesis Table Conversion 

Let us try table conversions on the following simple 
example with two poles al -l±j!0 and a zero at -2. The 
response of this filter is shown in Fig. la. The pole-zero 
equation is: 



s + 2 



(s + l-jlG][s + l+ilG) 



which is represented by the HP 3 562 A 's display as shown 
in Fig* lb. The pole- zero diagram is shown in Fig. 1c* 

If the HP 3 56 2 A converts the equation to polynomial 
form* the numerator and denominator terms are simply 
multiplied out: 



s + 2 



s 2 + 2s + 101 
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Alternatively, the pole- zero or polynomial represent- 
tion can be converted to pole-residue form: 



s + 2 



s 4 +■ 4s :4 -+- 206s 2 + 404s + 1 0201 



(1) 



s + l-jlO s + l+jlO 

where the HP 3562A solves for a = 0,5-j0.05 as expected. 
Trying a slightly more difficult example with repeated 
poles: 



s + 2 



(s + i + }ions + i-jior 



Converting to polynomial form leads to the expected 
result [Fig, 2a): 
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Fig. 1 . (a) Response of a fitter with two poles at -1^zj10 and 
a zero at -2 (b) Poles and zeros table displayed by HP 
3562 A for (a), (c) Poie-zero diagram for (a) 



However, how is this fund ion represented in pole-resi- 
due form? Again, the HP 3562A has the answer (Fig. 2b): 

- J250 x icr fi J250 x 10" fi - 2.5 X 10"' - J25 x 10" :l 



.s-Kl-jlO s + l+jlO fs+l-jlO) 2 

-2.5xlQ- 3 + j25xlO~ H 
(s + l + jlO)* 



(2) 



A more interesting pole-residue case occurs when there 
are more zeros than poles: 

(s + l){s + 2)(s + l -J5)(s + l+j5) 



(s + l-jlOJ(s+l + ]10) 

This results in extra Laurent expansion terms — isolated 
powers of s that did not appear in the original pole- zero 
form (see Fig, 2c]: 



-37.5-J375 -37.5 + J375 



73 + 3s + s 2 [3) 





s + l-jlQ s+l+jlO 
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The implementation of the synthesis table conversions 
in the HP 3562A was straightforward* except for keeping 
track of a multitude of details. The pole-residue form re- 
quires many different cases, A consistent representation of 
the many forms had to be designed within the constraints 
oftheHP3562A*s display able character set. The necessary 
zero-finder routine is shared with the curve-fitting al- 
gorithm (see article on page 33). 

Many table conversions result in small residual error 
terms after conversion. To keep from displaying the error 
terms, the table conversion routines estimate the errors in 
their arithmetic as they proceed. If a term is as small as 
the expected error for that table conversion, then the table 
conversion routines assume that the term is indeed caused 
by arithmetic errors and sets the term to zero. This allows 
most table conversions to work exactly as expected. For 
example, converting a polynomial table to a poIe*zero table 
will give a zero at Hz T if that is where it belongs, not at 
2.31 xio~ 23 + Also, converting a pole-zero table with Hermi- 
tian symmetry to rational polynomial form will result in 
purely real polynomials. 

Another problem is caused by the large numbers often 
encountered in polynomials. For example, (s-lO.OOt)) 1 ' 1 
converted to polynomial form results in numbers as large 
as 10 4n . This will lead to numerical problems. We solve 
this problem in several ways. First, a scale frequency pa- 
rameter Is introduced to allow designers to design in units 
that are appropriate for their particular design problem. 
For many designs, choosing one hertz or one radian as the 
design unil is no more applicable than designing electrical 
filters using ohms, farads, and henries [most practical de- 
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Flfl. 3, Table (a) and synthesized response {b} for fifth-order 
Chebyshev polynomial (equation 4} 



signs use kll, /iF, and mH). Likewise, many filter designs 
are more naturally expressed in kHz or kil o radian s. or nor- 
malized to the corner frequency or the center frequen 
the filter bandwidth. Second, the formulas used for table 
conversions and frequency response synthesis were care- 
fully designed to try to minimize numerical problems. Fi- 
nally, if numerical problems cannot be avoided, the HP 
3562A table conversion routines attempt to diagnose the 
error, and warn the user of the numerical problems. 

Designing a Chebyshev Low-Pass Filter 

For a simple example of using the synthesis and curve-fit- 
ting capabilities of the HP 3562A t let's construct an equ irip- 
pie low-pass filter. The magnitude of the response of Ibis 
type of filter is equal to 1 ;*|. where TJtu) is the 

kth-order Chebyshev polynomial. Using the Chebyshev re- 
cursion relationship, we can quickly arrive at. for example, 
a fifth-order Chebyshev polynomial: 

Tg(tD) = 16<ii s -2Q<i> 3 + 5w 

This polynomial oscillates between ±1 over the frequency 
range of w = ±L We choose € = LO. resulting in a passband 
ripple of 3 dS. This function can he synthesized using the 
HP 3562 A 's polynomial synthesis capability [Fig. 3). 

Since the HP 3562A polynomials are expressed in terms 
of ju) t the filter function synthesized is actually: 



Curve Fit 

Poles And Zeros 
POLES 10 



ZEROS 



77184k 

2 1.77184k 

3 i. 43346k ij 5.96936k 

4 M. 43346k*) 5,96936k 

5 -547,536 ±] 9.6585k 

6 547,536 tj 9.6586k 



Time delay- 0.0 S Gain^-39.E+36 Seal' 



W 



SYNTf 


€SIS 








Pole 


Zerc 


j 








O.U 












\ 










B 












\ 






















i 










40.0 














\ 









FXO' 
tb) 



Hz 



20k 
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Ftg. 5, (a) Table of poles and zeros for synthesis of recon- 
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passband performance. 
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Fig, 6. Measured response (a) and table of poles and zeros 
(b) for actual filter constructed ustng the parameters synthe- 
sized in Fig 5. 

In addition, a frequency scaling factor has been entered to 
specify a corner frequency at (w= 1) = 10 kHz. 

While the synthesized function has the correct mag- 
nitude for the filter we want, the phase of the synthesized 
function may not correspond to the phase of the filter we 
are trying synthesize. Let's ignore phase for a moment by 
taking the squared magnitude: 



|H(t, 



H{oj)H*1q>) 



Curve fitting this function gives us ten poles, half of which 
are in the right-hand plane (Fig. 4a). Discarding the right- 
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Fig. 7\ Simple head positioning system for a disc drive. 
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hand plane poles, and resynthesizing the resulting pole- 
zero function, gives us the filter response desired (Fig, 4b}, 
An alternate approach would be to curve fit the original 
synthesized function H(cu) with results very similar to those 
used for Fig, 4b, The right-hand pole is then reflected into 
the left-hand plane tn arrive at a stable function with iden- 
tical magnitude response. 

Designing a Reconstruction Low-Pass Filter 

As a more complicated filter design example, we wish 
to design a simple low-pass filter that will aid in the recon- 
struction of an analog signal from digital data samples using 
a digital-to-analog converter (DAC), The filter has three 
requirements: 

1, The filter must block alias components at 1.56 times 
the passband corner frequency (the sample rate of the dig- 
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Fig. 9. Typical feedback control system. 

ital system is 2.56 times the filter corner frequency)* For 
this particular application, 45 dB of alias protection is con- 
sidered adequate. 

2, The filter must compensate for a (sin u]/u roll-off caused 
by the DAC outputting rectangular pulses rather than 
theoretically infinitely narrow impulses. After compensa- 
tion, the DAC and reconstruction filter together should 
have a pass band flatness better than ±0.5 dB, 

3. The filter must be inexpensive, using a minimum of 
second-order stages. 

To implement this design, the characteristic (sin w)/u* 
roll-off Is first examined on a sample frequency of 51,2 
kHz, This is approximated by the real part of: 

1-16,300 exp (-j9.766xl0- s tu)]/(w-10 6 ] 

which the HP 3 562 A can synthesize as a pole at 10"*' Hz 
with again factor of - 16,300 and a time delay nf 9.786 |U& 

A small offset is added to w in the denominator to avoid 
dividing by zero. Curve fitting the (sin tu)/w roll-off over a 
O-to-20-kHz range indicates that the roll-off can be well 
represented as a single heavily damped pole within this 
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frequency range. This convinces us that the roll-off correc- 
tion can be easily handled by slightly modifying the pole 
locations of a standard low- pass filter design. 

Accordingly, the fifth-order Chebyshev polynomial dis- 
cussed in the earlier example is tried, leading In the conclu- 
sion that a zero is needed near 1,6 times the comer fre- 
quency of the filter to make the transition to 45 dB of 
attenuation quickly enough. Trial solutions for the pole 
locations are determined by using the above Chebyshev 
design technique with a passband ripple of ±0.25 dB. The 
zero location chosen is based on the 1.56-times-the-corner- 
frequency criterinn (requirement 1 on page 29). Then, in- 
cluding the (sin ojJ.'id roil -off factor, the pole and zero loca- 
tions are modified manually by trial and error until the 
desired performance is reached. The fast synthesis capabil- 
ities of the HP 3 562 A make this manual adjustment ap- 
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Fig. 1 1 . By curve Ming the resonance portion of the response 
in (a) as shown in (b) t the HP 3562 A obtains the poles and 
zeros fisted in (c). 



proach practical The table in Fig. 5a lists the pole-zero 
locations of the designed reconstruction filter, Fig. 5b 
shows the performance of the filter by i I sell and Fig* 5 c 
shows I lie combined system performance. Fig. 5d shows 
the detailed passband performance of the combined filter 
and (sin u4"'u> system. While not quite optimal. I he design 
is flatter Irian can be achieved with the 1% resistor and 
capacitors lo be used in (he circuit implementation. 

Once the circuit is constructed, the actual performance 
can be measured using the HP 3562A t s measurement 
facilities (see Fj». ha J. Then the curve fitter is asked to find 
the pole-^ero locations actually obtained. Since the zeros 
of the (sin wj/w part of the system are known exactly, based 
on the system's sample frequency, the first four of these 
values are entered explicitly ill the curve fitter table | Fig, 
6b], The curve fitter then considers the entered values to 
be known constraints on the curve-fit pole- zero locations 
to be found and it only solves for the unknown pole and 
zero locations. 

Because of component tolerances, stray capacitance, fi- 
nite op amp band widths, and other imperfections, the pro- 
totype circuit performance is seldom exactly as designed. 
The pole-zero locations can be adjusted again, based on 
the performance achieved in the first-pass design, so that 
the production filter will be as desired. 

A Servo Design Example 

As a more advanced example of the use of the HP 3 5 62 A "s 
synthesis and curve fit abilities, consider the task of design- 
ing a servo control system. Fig. 7 is a sketch of the prototype 
of a disc head positioning system. A voltage is applied to 
an electromagnet attached to the disc head positioning arm, 
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and the electromagnet generated opposes the static 

magnetic field of the permanent magnet, causing the po- 
sitioning arm to move to an equilibrium position. That 
motion is detected by an accelerometer attached to the arm. 
We would like to use the information provided by the 
accelerometer to improve the response of the positioning 
arm to the excitation voltage. The accelerometer provides 
no static dc information on the position of the arm, su an 
additional position feedback system will ultimately be re- 
quired for movement at very low frequencies. 

Fig. 8a is a plot of the acceleration response of the original 
system as a function of the frequency of the applied voltage. 
The two Initial primary problems with this response are 
the strong resonance at 1,8 kHz and a sharp roll-off in the 
response near dc. Fig. 8b shows the corresponding step 
response. This is clearly a poor response for a positioning 
system to have. We need to improve the response using a 
feedback control system. 

Fig, 9 shows a diagram of a typical feedback control 
system, where G 2 is the existing electromechanical system 
whose performance we would like to improve. G x ts a filter 
to control loop performance, k is a loop gain parameter, 
and A is a precompensation filter to improve the output 
response without changing the control loop performance. 

As an initial try, close the loop without compensating 
networks. Set A = 1, G, = 1. and increase k from zero until 
G 2 begins to go unstable. Fig, 10 shows the resulting im- 
provement in the system as k is increased until the system 
is almost unstable. However, even with very small values 
of k the system becomes unstable, the response is domi- 
nated by a very sharp resonance at 1.8 kHz, and the system 
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response is not significantly improved at frequencies below 
1*8 kHz. This problem is typical of electromechanical con- 
trol systems* The pole is caused by the first significant 
structural resonance of the positioning arm. 

As we close the loop by increasing k, the system instabil- 
ity occurs at the frequency where the open-loop \ 
crosses 180 degrees. With the 160-degree phase shift, the 
negative feedback becomes positive feedback, and when k 
times the magnitude of G 2 is greater than one, we have an 
oscillator (or a broken system J. 

The fundamental problem to solve is to keep the system 
phase away from 180 degrees for as long as possible, and 
then to bring the system loop gain below unity before the 
phase does go to 180 degrees. A major portion of the phase 
problem is caused by the resonance at 1.8 kHz and the 
accompanying antiresunance at 2.2 ■ I urve fit these 

two features to find their actual frequencies and dampings 
1 - 11). The poles at ±1.83 kHz and the zeros at ±2.31 
kHz found by the curve fitter (Fig. lie) are the actual poles 
and zeros we are looking for. The others are computational 
poles and zeros added by the curve fitter to compensate 
for resonances outside the frequency range the curve fitter 
examined. 

As a first attempt at loop compensation, we place a zero 
f-95±jl830) on top of the pole location just found, and 
place a pole (-170±j2300) on top of the zero found. Fig. 
1 2 shows the resulting compensated open- loop system per- 
formance. While smoother. Lhe phase still rolls toward 180 
degrees faster than we would like, and there are a number 
of difficult-tO'Control resonances around 2 A kHz. Let's try 
rolling off the loop gain below 2.4 kHz. but keep the loop 
gain high at 200 Hz where there ts a sharp structural reso- 
nance. 

Fig. 13 shows the response of an additional G t element 
we design to meet these loop roll-off goals. This element 
is a pole pair at 250±j250 Hz. Unfortunately, this pole 
pair creates added phase delay, greatly lowering the fre- 
quency at which the G t -G 2 system response crosses 180 
degrees (Fig. 14]. We ask the HP 3562A's curve fitter to 
find an all-pass phase compensation network to solve this 
phase problem. Setting the magnitude to unity and curve 
fitting to the phase response as shown In Fig. 14 gives two 
poles {-1721.15 and -300.517) and two zeros (287. mih 
and 1558.58). To be strictly all-pass, the pole locations 
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Fig. 1 3. Response of additional G T element designed to meet 
loop roil -off goals 



Fig, 14. Phase response with added G, element of Fig. 13 
This response is curve fitted to help find parameters for an 
aft-pass phase compensation network 



JANUARY Iflfl7 HEWLETT-PACKARD JOURNAL 31 



)Copr. 1949-1998 Hewlett-Packard Co. 



must match the zero locations exactly, We choose u, 
— 300 Hz and tr 2 — 1600 Hz for the pole-zero locations of 
our all-pass phase compensation filter. This results in a 
conditionally stable control loop, but the right choice of k 
will give a stable response. Fig. 15a shows the table of the 
total pole-zero locations for the combined G 1 loop compen- 
sation network. With k = 80 T the closed-loop system re- 
sponse is as shown In Fig. 15b and Fig. 15c. 

The system has greatly improved flatness, but there are 
still troublesome system resonances above 3 kHz. Design- 
ing a simple precompensation network A with poles at 
-40D±j5G0 for the overall closed-loop feedback system 
gives the acceptable system response and corresponding 
step response shown in Fig. 16, 
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Fig. 1 6. (&) Acceptable frequency response gamed by add- 
ing a precompensation network A. (b) Corresponding step 
response. 
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Fig. 15. (a) Table of poles and zeros for the combined G? 

loop compensation network, (bj Frequency response of 
closed -loop system, (c) Phase response of closed-loop sys- 
tem. Troublesome resonances stiff exist above 3 kHz 



32 HEWLETT-PACKARD JOURNAL JANUARV 19a? 



)Copr. 1949-1998 Hewlett-Packard Co. 



Curve Fitter for Pole-Zero Analysis 



by James L Ad cock 



THX ALGORITHM in the H 

Analyzer finds a pole-zero model of a linear system 
based on the system's measured frequency response. 
The curve filter does this by calculating a weighted leasi- 
squares fit of a rational polynomial to the measured fre- 
quency response: 



H'(s) 



a + a t s + a 2 s^ + 
b + b^s +- b 2 s 2 + 



Then the polynomials In the denominator and numerator 
can be factored to find the poles and zeros of the measured 
system (or alternatively the pole-residue or polynomial 
form). Fig. 1 demonstrates actual curve fits by the HP 3 5 fi 2 A 
for very clean and very noisy measurements 

The curve fitter implements the inverse of the traditional 
engineering design process, which attempts to predict the 
measured response of a system based on an analytical 
model Instead, the curve fitter attempts to predict the 
analytical model of the linear system based on the mea- 
sured response. If w r e are to close the loop on the engineer- 
ing design process, both analytical modeling tools and pa- 
rameter extraction (curve fitting] are necessary as follows: 
1. Design a prototype system [using the HP :i5G2A's syn- 
thesis capabilities, see article on page 25], 



i-R^Q RE3P 



OtOv Ip Urt 1 f 



-CO . o 
Fxd y 1 TSo 




(a) 



tCAvg CVOvip On If 




Fig, 1- HP 3562 A curve fit to a very dean measurement (a) 
and a very noisy measurement fjbtj 



2, Build the prototype 

leasure the prototype's frequency response (using any 
of the HP 3562A*s three measurement modes}. 

\ tract the prototype's pole -zero parameters [using 
HP 3 5 62 As curve fitter on the measured response). 

5. Compare the results to the original design goals. 

6. Modify the results (using the HP 3562A's math and syn- 
thesis capabilities) to arrive at an improved design. 

7. Go to step 2 and repeat the process until the desired 
design goals are achieved. 

Theory 

In discussing the theory behind the HP 3562 A's curve 
fitter, the following variables and definitions are used: 

<u sampled, normalized frequency, such that u> [imx = t 

u> rmn lowest frequency to he curve fitted 

highest frequency to be curve fitted 

i V 1 

H[c*)] modeled frequency response function to be 

determined 

H'htf] measured frequency response function 

P numerator polynomial in u> 

Q denominator polynomial inuj 

\V weighting function 

e [m] weighted error fu nation 

E sum of llit: squared error 

T ri nth-order Cliehyshev polynomial 

<1> n umerator orthogona I po I y nom ia Is 

^ denominator orthogonal polynomials 

A numerator polynomial coefficients to be determined 

B deno mi nator po lynom ia 1 coefficients to be 

determined 

M ' conjugate transpose of a matrix M 

O Null matrix, all elements = 

The theoretical frequency response function nt a linear 
system can be expressed as the quotient of two polyno- 
mials: 






Real 



Polynomial 




-1.0 



Fxd ro Hz 



POWER SERIES 



Fig, 2, The shapes of power series become very similar for 

high orders over most of the normalized frequency range. 
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HM = PM/Q[w| 



(i) 



where co is the sampled, aomialized frequency. This is in 
rational fraction form. II y using any of the linear resolution, 

log resolution, or swept sine measu rement modes of the 
HP 3 5 62 A, we obtain the measured frequency response 
function: 



H'M 



(2) 



in real and imaginary form, We want to find t he coefficients 
of P[w| and Q[w|. given that the error, or the weighted 
difference of H[qj| find H'|cjj], is minimized. This can be 
written (leaving out the functional s): 



W'|H'-H| 



(3) 



where W is a weighting function to be used in the weighted 
Least-squares derivation to improve the overall quality of 
the curve fit 

P and Q could be defined to be just the ordinary power 
series in ja>. Then P Q would be the ordinary rational frac- 
tion representation of h frequency response function. How- 
ever, ordinary power series have several very had charac- 
teristics when used for curve fitting, Hirst, they have a very 
large dynamic range. Second, for the higher orders. I he 
shapes of the power series become very similar over most 
of the normalized frequency range (see Fig. 2), These prop- 
erties lead to numerically very ill-conditioned matrices 
when power series are used lo derive least-squares equa- 
tion*. 

In contrast , Chebyshev polynomials have a small 
dynamic range, being bounded hi amplitude between 1 
and I, and each Chebyshev polynomial has a unique shape 
in the normalized frequency range where they will be used 
(see Fig, 3), This makes these polynomials particularly 
well-suited for least-squares derivations. Thus, the HP 
3562A curve-fit algorithm uses Chebyshev polynomials in- 
stead of ordinary power series, 

From the following defining equations for Chebyshev 
polynomials it can be seen that for every series of 
Chebyshev polynomials of order n. there is an equivalent 
nth-order power series: 



TJ«| - 1 

TJtol — oj 

TJH = zidT,, 
also 



14) 



I" „[w] = TJwj 

-i " i: „H - T n + m M + T n . m M 
T n {(*)] = cos(n cos~ 1 w) 

Thus, after solving for the coefficients of P-and Q ex- 
pressed in Chebyshev polynomials, the equivalent coeffi- 
cients of the same order power series can be found. For 
reference, here are the first few Chebyshev polynomials 
expressed in power series: 



To(<»] = 1 

T,|i«j| - w 

T 2 [(d] -- 2ur-l 

T a H = 4<i> :l -3n> 

T 4 [w| ~ Hu> 4 8ur + l 

T-[u>] = 16(D r, -20w :i + 5o) 



(5] 



To solve for up to 40 poles and 40 zeros, the HP 3562A 
curve fitter requires Chebyshev polynomials up to T 80 [cn]. 

The second-to- last equation in equation set 4 is equiva- 
lent to the ordinary trigonometric product-equals-sum- 
plus-difference formula. In addition, consider that the 
Chebyshev polynomials can he thought of as a frequency 
warped set of cosines. As a result, many of the concepts 
of fast Fourier transforms [FFTs] can be used to hasten the 
solutions of Ihese equations, In iacl, we will show later 
how these trigonometric identities are used to speed up, 
by an order of magnitude, the calculation of a major portion 
of the curve-fit algorithm. 

Define the two sets of orthogonal polynomials tf>- t = T s 
and ^ = T k . where T, and T^ are the it h-order and kth-nrder 
Chebyshev polynomials in qj, such that; 



P"- % *\4>i 



(6) 



SYNTHESIS 1 



Polynoaial 




*aN 

T 3 H 



Rfl, 3. Chebyshev polynomials have different shapes over 
the normalized frequency range 



Q = 2 M k 



(7) 




Fi g . A. Weighting function for the curve fit shown in Fig .1b. 
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qj\ in matrix form. 



<*>A 



Q = ^B 



m 



m 



Substituting equations 1 and 2 into equation 3. and mul- 
tiplying through by Q (a necessary trick to make the result- 
ing least -squares equations linear)- we arrive at: 



e = (WQH'-YVP) 



(10] 



where W is equal to IV* with the pole locations deem- 
phasized. Since VV must be determined before P and Q are 
to be solved, W is preweighted by an experimentally de- 
rived pole location estimation technique. Then the multi- 
pi ied-thraugh-by-Q formulation gives very similar results 
to the original nonlinear P'Q formulation: e'^c* We have 
found that increasing the weighting of the zero Locations, 
as well as the pole locations, gives improved results. This 
maximizes the quality of fit when viewed on a decibel scale 
rather than a niagni I ude -squared scale. In addition, the 
weighting function is improved by deemphasizniy regions 
of particularly noisy data. Fig, 4 shows die weighting [unc- 
tion corresponding to the curve fit shown in Fig, lb and 
Fig. 5 shows the equivalent curve fit if the weighting func- 
tion is set to unity. 

Using equations H and 9, the formulation becomes: 



or. 



e = [WH'OT-W*A] 



t - (W h 4>B W,<r»A) 



fin 



(12) 



where W tl - W and W h - WIT. 
The sum of the squared errors is given by; 



E = J£ (B T ^ ] W t ; - A T T WJ(Wi J VB - W a *A) 



To minimise the squared error, set aE it A and HE i*B = 0. 
This gives two equations: 



H— - j£ m *b - iv^Ai - o 



|| = 5 * *8 0A1 = 



Written in partitioned matrix form: 



[13) 



where 



(14] 



D = j? W 8 W^ T * 



Two problems with equation set 14 must be addressed 
before we have a practical solution to the curve fitting 
problem. First, |WJ :i consists of |W 2 ||H' Z | where H r typically 
has some noise on its measurement. Let 

H' = H" + e h 

where H" is the ideal frequency response and e (l is the 
random error caused by measurement noise (assumed to 
be not correlated with FT), Then the expected value of 
|Hf is: 

lHf f V 

|H'P is typically a bias lale nj |l T "| ■', and was found 

to lead to biased estimates of the curve fit. and results in 



IQAvg QKQvXG Unlf 



tOAvg 




FHO V iOO 



Fig. 5, Curve fit for Fig lb when the weighting function is 
set to unity. 




wxtt v joo 



Fig. 6, Curve fit for Fig. 1b when the bias caused by the 
random error introduced by measurement noise is not re- 
moved. 
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biased pole-zero locations. Once we discovered the source 
of this bias, the solution was simple: subtract an estimate 
of e h 2 from |H'| 2 . Fig. 6 shows the results when this bias 
effect is not corrected for. 

The second problem that must be solved to have a usable 
• il^orithni concerns calculation speed and memory storage 
requirements. A curve fit with 40 poles uui 4f) zeros re- 
quires the solution of an 80-by-80 matrix— 641)0 double- 
precision floating-point numbers. This exceeds the mem- 
ory available in the HP 3 56 2 A for storing the matrix. In 
add it ion , the 6400 numbers are each constructed from the 
product and summation of up to flOQ measured complex 
frequency response data points. Thus, several million dou- 
ble-precision floating-point calculations can be required to 
construct the matrix. We needed a way to reduce these 
memory storage requirements and to hasten the calcula- 
tions of the matrix elements if the curve fitter was to be 
usable for more than a dozen poles and zeros. 

Fortunately, the Chebyshev product - sum + difference 
relationship: 

2T,T k = tj^+t^i 

provides the solution. This relationship can be applied 
simultaneously to each of the four submalrices G, D, D r , 
and F h resulting in great savings in calculation lime and 
storage space. The following discussion uses subrnalrix G 
to illustrate how this relationship is used to obtain the 
necessary calculation speed and memory savings: 



= % |Wa*t k T, 

= § S |W„| 2 T k+i + 1 J |W b | a T k 



At first glance, (his appears to have doubled the number 
of summations that need to be performed, However* now 
ni.uiy of the summations are identical, so that we only need 
to perform each different summation the first time we en- 
counter it in I he calculation of the submatrix. After that. 



when we encounter a particular summation again, we can 
use its previously calculated value, For example, 1\ . s = 



I i* a ' i ■ 



T 4 , r *i. f so that all of the summa- 



tions along the diagonals of G' are identical. Likewise, all 
summations along I he opposite diagonals of G" are identi- 
cal. Thus, for an ixk submatrix there are only 2[\ - k) dif- 
ferent summations thai need lo be performed. This reduces 
the number of sum ma I ions that must actually be calculated 
by an order of magnitude. Similarly, G no longer needs to 
he actually stored in memory* since each individual ele- 
ment G ik can be quickly recalculated whenever needed 
from G k , j and G£ r That is, G ik = G£, 4 ± 4-GJJ_j. This solves 
our memory storage requirements. 

Finally, the solution to ihe homogeneous set of equations 
(equation set 13) gives the coefficients of polynomials A 
and B. Special matrix techniques are required to maintain 
accuracy for I he high -order equations desired. Of the tech- 
niques evaluated, a Gram-Schmidt nrlho^onali/ation tech- 
nique 1 gives the most accurate results. Equations f> and 7 
yield the Ghebyshev coefficients of the rational fraction 
polynomials P and Q. The coefficients are converted to 
coefficients of the ordinary power series in jtu T the zeros 
are found using common zero finder techniques/ 3 and 
then the pole and zero locations are renormalized to their 
original values by multiplying through by 1/<i) max . For de- 
sign examples using the HP 35$2A*S curve fitter, see the 
article oUa page 25. 
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Performance Analysis of the HP 3000 
Series 70 Hardware Cache 

Measurements and modeling pointed the way to improved 
performance over the Series 68, 

by James R. Callister and Craig W. Pampeyan 



THK HP M)[)t) SERIES 70 is a recent addition to the 
HP 3000 Business Computer product line. Its design 
objectives were to provide a significant upgrade to 
tlie Series 68 in a short design cycle. The performance of 
the Series 70 was engineered through adherence to a strict 
methodology of measurement, modeling, and verification. 
This paper gives a description pi the methodology with 
exam pies drawn from the major component of tire Series 
70 — the cache memory subsystem. 

Cache memories are notorious for being poorly charac- 
terized. 1 At the system level, caches appear to be nondeter- 
ministic, which makes them very difficult to model. In 
addition, they are very sensitive to the system workload, 
The design of a cache provides a severe test for any estima- 
tion methodology. 

An outline of the generic performance engineering cycle 
15 ^hmv-i in Fig. 1. The steps of characterisation (modeling, 
design analysis and tracking, benchmarking and product 
testing) are straightforward. Yet rigorous adherence to pre- 
cision in each step returns enormous dividends in overall 
accuracy t thus ensuring a products success/ A precise 
methodology includes the use of sampling theory, statisti- 
cal analysis, and measurement validation. The overall 
methodology includes the additional benefit that the field 
tests in the final stage of a product design are also used to 
provide customer characterization data in the first stage of 
the next product, 

Current System Characterization 

The major goals of the Series 70 were to maximize per- 
formance improvement and minimize change to the Series 
68. These conflicting goals were resolved by extensive mea- 
surements of current Series 68 systems. Measurements of 
systems in the field indicated how the systems were being 
used and which system components were being stressed. 
Measurements were also used to construct a characteriza- 
tion of the customer hase. Any proposed performance en- 
hancement could then be evaluated in terms of its effect 
across the entire customer base. 

Measurement tools play a crucial role in the performance 
design methodology. It is important for tools to be chosen 
that both accurately measure the metrics of interest and 
are easy to use [see "Measurement Tools," next page). 
Sometimes the right tool does not exist. In a development 
lab, late data is the same as no data. If a tool must be 
developed, the development time must be short enough to 
deliver timely data, The Series 70 performance design team 
made use of both existing tools and new tools. The new 



tools were largely leveraged from existing products* 

HPSnapshot 

The premier measurement tool for HP 3000 systems is 
HPSnapshot, HPSnapshot is a suite of software collection 
tools, reduction programs, and report generators. The tools 
measure such things as peripheral activity, system resource 
utilization, process statistics, and file use. The collected 
data is stored on tape and sent to a dedicated system for 
reduction. HPSnapshot is easy to use and allows many 
systems to be measured quickly^ 

HPSnapshot runs were made on fifty Series 68s in the 
field. Individual runs were analyzed and selected metrics 
were stored in a data lyase. This data was used 1o charac- 
terize customer workloads and 1o analyze bottlenecks re- 
stricting additional performance. The measurements and 
resulting analysis showed that the best performance oppor- 
tunities lay in improving the CPU. Several teams were 
created to investigate the areas suggested by the data (see 
'Series 70: Not Just a Cache," page 40], 

Hardware Monitor 

The HPSnapshot data suggested that more extensive mea- 
surements be made on the CPU itself. Several tools were 
created to make these measurements. One of these tools. 
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Measurement Tools 



Computer systems are extremely complex, requiring cam- 
?vb!s Measurements, too, must be 
r ent levels to provide a complete system character ■ j 
For a particular metric, it is important to choose the best tool 
The tools can be compared in terms of what they can measure, 
the if ease of use, the overhead they incur, and their s& 
frequencies Table I compares the tools for each category. 

Hardware measurement tools analyze signals from the 
machine under test Most hardware tools are passive and do not 
disturb the system. They are fast enough to capture individual 
machine cycles and therefore are very good at cot feeling low- 
level traces and sequences. The short-term sampling frequency 
can be very high Hardware tools cannot measure higher levels 
of the system such as processes and programs They require 
equipment attached to the system under test and extensive setup 
time 

Machines such as the HP 3000 Series 66 have writable control 
store, allowing the use of mrcrocoded tools The sampling rate 
is slower than hardware tools and the extra microcode takes 
some minimal amount ot system overhead. Macrocode toots are 
best suited for obtaining statistics at the procedure and process 
level. They are also very good at sampling at fixed time intervals. 
The installation of the tools is straightforward but does require a 
cool start of the system. 

The most abstract level of measurement requires software run- 
ning on the system under test. There can be significant overhead 
associated with software tools, which may also perturb the sys- 
tem. But software tools can track paths m system level algorithms, 
monitor infractions between processes, and log a large number 
of software event counters Software tools are generally very 
easy to install and use. 



Table i 
Comparison of Measurement Techniques 

Sampling Tyj» 

Monitor Overtveaa Ease est Use Freqtjency Measurements 

Hafdwaie Q% Hardware fGP-1Cflte S*gnass short traces 

required 

Microcooe 0.1-1% Cooistan it Fixed-time sampling, 

required procedures processes 

Software 5-10% Sunpie lO^s Software ever i count- 

I nstai ration ers. sysiem inter - 
actions 

The Series 70 performance anafysjs made use of each type 
of TooJ Besides the HPSnapshot and hardware monitor tools 
mentioned in the accompanying article several microcode- 
based toots were used 

Microcode Tools 

The Instruction Gatherer records the current executing instruc- 
tion at one-millisecond intervals, It also gives some information 
about the most frequent subcases of the instructions The data 
from ten sites gave a time-weighted profile of which instructions 
were executed most often This information Jed to the remicrocod- 
ing of two instructions for increased performance 

The MicrosampJer is a simple micrucoded version of the Sam- 
pler software tool. It records program counter values in the 
operating system code. From this information, six high-frequency 
procedures were identified and rewritten in microcode 

The cache post microcode is special-purpose microcode used 
to investigate alternative solutions to cache posting. The micro- 
code validated the use of the cache simulator for the posting 
circuitry investigation 



the hardware monitor, provided Invaluable measurement 
capability, 

The monitor consists of an HP 1630 Logic Analyzer 
coupled to rin HP Touchscreen ParsonlJ Computer via the 
HP-IB (IEEE 488), .is slimvii in Fig. 2. The probes of the HP 
1630 are attached to pins tm the backplane of the system 
umlct tesi [HP 3000 Serie? 64» 4KB, or 70), The touchscreen 
computer serves as controller, reduction tool, and data stor- 
age for the HP 1630*' The monitor can automatical J y run 
a series of independent tests with a simple command fill- 

For each lest, the Touchscreen computer begins by down- 
loading configuration information to the HP 1630. The mum 
surernent is then started on the HP 1630, The Touchscreen 
waits for a specified time, then halts the measurement. The 
collected data is uploaded to the Touchscreen computer 
where it is reduced and stored to disc, A side henefil of 
the automated process is that the uploaded data from the 
HP HJ30 is more detailed than that available through man- 
ual operation* Careful analysis of the internal operation of 
the HP 1630 gave us confidence that it would satisfy static 
tical sampling demands. 

A collet lion of ten tests was run on four HP 3000 Series 
68 systems. These systems werv chosen based on how well 
thev represented the customer base, as determined from 
the HPSnapshot data. The tests measured many variables 



including instruction paths. I/O use, and cache statistics. 
The tests required a total of 59 probes. 1 1 was calculated 
that half-hour samples would best meet the conflicting de- 
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tool 
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The Series 70: Not Just a Cache 



The HP 3000 Senes 70 Business Computer is a collection of 

performance enhancements. To be included in the Series 70 
product, each enhancement had to qualify in terms of measurable 
performance, applicability across the customer base and ^de- 
pendence of the other enhancements. Each enhancement was 
the result of the methodology of measurements and analysis 
used in the Series 70 cache design and described in the accom- 
panying article 

Microcode was written to find the most common instruction 
executions Two instructions that were surprisingly prominent in 
the mix were rewritten to execute faster 

Microcode was a J so written to find the most often used MPE 
procedures. This information led to the seiection of six proce- 
dures to be rewritten in microcode. The microcoded versions of 
these procedures execute up to ten times faster. 

The expanded main memory offered on both the Series 66 
and 70 was also subjected to measurement and analysis. The 
effect of additional memory on the multiprogramming level, the 
number of physical l/Os and the CPU utilization were studied. 
Methods to identify when main memory bottlenecks occur were 
developed, which resulted in projections of performance im- 
provements resiling from additional memory. 

There were also several improvements in the MPE operating 
system software. The HPSnapshol tool revealed areas where 
changes could have a significant effect on system performance. 



Analytic models helped estimate the relative merits of several 
proposed algorithmic changes in the memory manager, dis- 
patcher, and disc driver. The most beneficial changes were 
implemented by the MPE software engineers. They were then 
run through many benchmarks to verify l he performance improve- 
ments 

Although nol part of the Series 70, the HP 7933/35XP cached 
disc drive was also a result of the performance engineering cycle 
described here Important workload parameters were identified. 
A comparative study of the I/O subsystem performance with MPE 
disc cachi ng , no caching , and the HP 7933/35XP wa s conducted 
Engineers at HP r s Disc Memory Division worked closely with the 
commercial systems performance engineers to model various 
design alternatives. Benchmarks were run on prototypes of the 
enhanced disc drives and the performance gains were verified. 
The system workload characteristics under which the cached 
discs performed best were explicitly identified to ensure cus- 
tomer satisfaction 

All of these components were evaluated with respect to each 
other and to the system as a whole. The performance increases 
of the components complement each other and keep the system 
balanced. The resutt of the Series 70 project is a product offering 
a 20% to 35% increase in system throughput over a Series 68 
Expanded main memory and the HP 7933/35XP can provide 
additional performance improvements 



mands of aggregate and variance analysis. Each sample 
captured about half a million cycles, and a total of 144 
samples were collected, 

The data gave insighi into the low-level aclivities uf the 

CPU. As a result, a number of performance opportunities 
were identified. The biggest opportunity lay in improving 
the cache memory subsystem. 

The Scries 68 cache is a 4 K- word, 2 -set cache (sec "How 
a Cache Works." page 42} , The hit rate measured on the 
four systems was 92.5%. However, although a cache rniss 
occurs only 7.5% of the time, almost 30% ol the CPU t inn- 
is spent waiting for cache misses to be resolved. During 
this time T the CPU is frozen and cannot proceed. A simple 
model of the cache was created and validated through the 
hardware monitor, The model suggested that if the hit rate 
could lie Improved by 5 or b percent, the CPU would freeze 
only about 10% of the time. This would result in a savings 
of almost 2f)% of the CPU cycles, which translates into an 
effective speedup of about 25%. The availability of denser 
RAMs and programmable army logic parts (PALsj indicated 
a strong possibility for just such an improvement. 

Modeling 

Modeling extracts the essential characteristics of a sys- 
tem and converts them into a form that is easily evaluated 
and modi tied. There are two major types of modeling. An- 
alytic modeling computes steady-slate values of a system 
according to laws of queuing theory. 4 Models are conve- 
nient and can guarantee correct results if the input data is 
correct and complete. The disadvantages lie in restrictions 
placed on the type of environments that can be modeled, 
Typically, simplifications must be made to the environ- 



ment to make the analysis tractable. 

Simulation is also modeling, but at a lower level of 
abstraction. In a Monte Carlo simulation model, the key 
system variables are represented by their measured proba- 
bility distributions, Random numbers are applied to the 
model to generate specific values for ihe key system vari- 
ables. These specific: values are then used to compute the 
output variables of interest. 

Trace- driven simulations are at a still iow r er level of 
abstraction. Traces of values for each of the key system 
variables are collected and applied logether to a determin- 
istic model of the system, The output variables of interest 
are computed for the spec iiic set of trace data. 

Simulation models can be constructed and solved lor 
virtually any system. However, although simulation mod- 
els provide valuable information, their limitations must 
also be recognized and understood. For example, simula- 
tion models cannot guarantee a steady-state solution t but 
only a particular solution corresponding to the input data. 

Any model runs ihe risk of ignoring some unknown, yet 
essenlial feature of the system. Additional dangers exist in 
I lie measurement of system variables and construction of 
the model. Careful modeling subjects the recorded vari- 
ables to independent tests of correctness. The model con- 
struction can then be validated by modeling the existing 
system and comparing the results to measurements of the 
existing system. 

After validation, the model is used in design analysis. 
Various changes can be introduced and the performance 
changes observed. The set of changes that maximizes per- 
formance (and satisfies constraints of cost, design time, 
etc.) is then chosen as the final design, The final design is 
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then modeled and the performance for the product pre- 
dict' 

There are currently no good system variables that will 
accurately predict how a cache will perform. This makes 
it impossible to construct analytic or Monte Carlo Simula- 
models that accurately predict cache performance. 
Currently, the best way to model caches is through trace- 
driven simulators. Th *t) cache design used both 
analytic and trace-driven simulation, A simple analytic 
model at the system level was constructed using the results 
of the trace-driven cache simulator, 

The cache modeling process involved collecting traces 
of memory reference addresses. Software was written to 
simulate various cache organizations. The traces were then 
run through the simulator under various cache organiza- 
tions. 

Ideally, the traces would have been collected from a 
Series 68. However, the speed of the Series 68 prevents 
data collection without special high-speed hardware. In- 
stead, a Series 37 was chosen, A special memory controller 
that reads the memory reference address and writes it to 
its own local memory was constructed quickh The special 
controller is passive and traces all system activity, includ- 
inn the operating system. One mil lion consecutive memory 
references can be collected, Tills nnmlser is sufficient to 
guarantee many calls to the operating system, and also 
includes many task switches. Three different customer en- 
vironments were run to collect the data. Approximately 
nine measurements were collected for each environment, 
for a total of 30 million memory reference* |.see "Realistic 
Gache Simulation," page 45 1. The traces were then sub- 
jected to a series of tests to confirm that the collected data 
was correct. 

Cache Simulator 

the cache simulator was developed in parallel with the 
< ulli t rinn process* The simulator takes the trace data and 
applies ii If) various models ol Hi*' i di le\ Mir simul&fel 
development consisted of two phases. The first phase con- 
centrated on completeness and correctness of the model 
implementation* the ease of use, and the choice of statistics 
to be kept. These goals led to a modular structure, very 
detailed statistics, and gave considerable freedom in adding 
and altering features. This flexibility h<is allowed me 
simulator to be leveraged to model caches for several other 
machines besides the Series 70. 

It is extremely important that the simulator model the 
cache designs accurately. Artificial data was generated and 
simulated and then compared with the expected results in 
verify I he accuracy of the cat: he simulator. \'e\l, I he Se 
68 cache was modeled with the trace data. The same envt- 
nniinent was then run on the Series 68 and the ai Uial cache 
statistics were collected with the hardware monitor. The 
close correlation between the simulated and aelual results 
gave a high level of confidence that the simulator would 
provide reasonable information on w T hich to base the design 
of the Series 70 cache. 

Phase two of the simulator development concentrah rl 
on I he speed of the simulator by porting it to a mainframe 
computer. During the port, unnecessary code ami structure 
to&m eliminated The verification tests were then run again 



and compared to the original simulator output. The stream- 
lining and port to the mainframe resulted in a functionally 
equivalent simulator that runs 80 times faster. A simulation 
of a single trace of one million memory references now 
takes about 45 seconds. 

Tfb is the ability to van seven different pa- 

rameters: total size of the cache, associativity, block 
algorithms for handling writes, block replacement s 
the handling of LO requests, and tag Indexing* Simulations 
run varying each of the parameters. The effect of each 
parameter on cache performance and the sensitivity of per- 
formance to each parameter were determined. 

Fig, 3 shows an example of the simulation results, The 
graph shows that the biggest contributor to cache perfor- 
mance is the size of the cache. The effect of diminishing 
returns with Increasing cache size is clearly seen. Fig. 3 
also shows the effect of associativity on different sizes of 
caches. The increased complexity of a multiple-set cache 
I an be weighed against the performance gain it provides. 
This information was computed for all parameters, and the 
best combination of perforrnance t complexity, and cost was 
determined. The final design was then chosen and simu- 
lated, The Series 70 cache size is 64K words, which is 16 
times larger than the Series 68 cache. Like the Series 68. 
the Series 70 cache has 2 sets. 

The simulator provides cache performance information 
such as hit rates. It does not, however, provide a higher- 
level view of system performance, such as throughput. In 
particular, the memory reference traces do not include any 
measurement of time between memory references. The 
hardware monitor provides such information for the Series 
68, An analysis of the system showed that the timing infor- 
mation would be valid fur the new cache, The simple 
analytical model uses both the simulator data and the 
hardware monitor data to produce estimates of the effect 
of the new cache on the system. Besides the expected values 
for cache statistics, the model also shows a saturation effect. 
The lower the hit rate a system has, the more the new cache 
will benefit it, 

While the cache simulation traces were fairly stable, the 
data from real Series 68s had more variance. Statistical 
analysis of the data led to a range of values for the percent- 
age of CPU cycles that could be recovered by the Series 
70 cache. A 90% confidence interval was chosen for the 
range, which means that, ^iven normal distributions and 
random sampling, the mean of the cycles recovered should 
lie within the range nine times out often The range yiven 
by the analysis was 19.4% to 28.3%, with an estimated 
mean of 23.8% recovery of "lost" CPU cycles. Because it 
was known that the traces from the Series 37 would give 
optimistic results [since main memory for the Series 37 is 
several times smaller than the Series 68), the target for the 
Series 70 cache was set at 20%, 

The other components of the Series 70 underwent similar 
analysis. The analyses were then combined to give a predic- 
tion of the overall system performance gain. Care was taken 
to estimate the overlap effects exhibited when two or more 
components try In tree the same resource, Predictions of 
the bottlenecks within the new system were also made. 

At this point, the performance prediction was used by 
Hie marketing department to start working on the pricing 
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Series 70 Cache Simulation 

Hit Rate versus Total Cache Size Versus Associativity 
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Fig. 3. Art example of simulation 
results, showing thar cache sue 
has the greatest effect on cache 
performance. 



and positioning strategy for the product. Since the normal 
approach is to wait for the actual hardware to become a vail- 
able to measure the performance gain, valuable time was 
saved In the introduction process through the use of the 
highly accurate simulation results, 



Design Tracking 

Measurements and modeling require a lot of work that 
precedes product development. Not only do they help es- 
tablish the correct design, but they are also often valuable 
during the development phase. Unexpected design changes 



How a Cache Works 



The purpose of a cache memory is to reduce the average 
effective access time for a memory reference. The cache contains 
a small high-speed buffer and controi logic situated between the 
CPU and main memory 1 The cache matches the high speed of 
the CPU to the relatively low speed of main memory 

Caches derive their performance from the principle of locality 
This principle states that, over short periods of time. CPU memory 
references from a process tend to be clustered in both time and 
space This means that data that will be in use in the near future 
is likely to be in use already. It atso means that data that will be 
in use in the near future is ioeated near data in use already. The 
degree to which systems exhibit locality determine the benefits 
of a cache. A cache can be as small as 1/2000 the si2e of main 
memory, yet still have hit rates in excess of 90% under normal 
system loads. 

The cache takes advantage of locality by keeping lhe most 
often used data in a high-speed buffer When the CPU requests 
data, the cache first searches its buffer, ff the data is there, it is 
returned in one cycle; this is termed a cache hit, If the data is 
not there, a cache miss occurs and the cache must retrieve the 
data from main memory As it does so, it also retrieves several 
more words of data in anticipation that they will soon be requested 



by the CPU.. Servicing the miss can take many cycles. 

To achieve one-cycle access, it is not possible to search the 
entire cache for the desired data. Instead, every word of data «s 
mapped to (or associated with) only a limited number of places 
in the cache When the data is requested, these places are 
searched in parallel. Each pah of the cache that can be searched 
in parallel is called a set, One-set and two-set caches are most 
common because of their reduced complexity. 

Cache designs also include decisions on how much data to 
fetch for each cache miss and how to decide which data to 
replace on a miss (for two or more sets) The Series 70 simulations 
confirmed previous academic work that the most important factor 
in cache performance is the size of the cache. The simulations 
also showed mat the design decisions for the other factors of 
the Series 68 cache were still appropriate for the Series 70 cache. 
Thus the performance could be achieved with a minimum of 
change to the operating characteristics of the Series 68 cache. 1 5 

References 
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can be quickly modeled and the impact on system perfor- 
mance assessed- The measurement tools can be valuable 
aids in debugging the prototype. The Series 70 cache de- 
velopment benefitted from these uses of the modeling and 
measurement tools 

One of the requirements for the cache was to be able to 
post all modifications of its buffer to main memory in the 
t of a powerfail. B be Series 70 had to be back- 

wards compatible with the existing Series 64 and 68 
machines, it had to work with the two types of power 
supplies on those machines. It was determined that the 
older power supplies had insufficient carryover time to 
complete the worst-case post. 

Several alternatives were proposed to solve this problem, 
The first alternative was a microcode routine that would, 
for the older systems t routinely post all modified data from 
the cache to main memory. The second alternative was 
circuitry to monitor the cache and post the data whenever 
the amount of modified data exceeded a certain threshold. 

To choose between the alternatives- it was necessary to 
simulate them, it was very easy to modify the cache 
simulator to evaluate the posting alternatives. However, 
there was concern over whether the trace data would be 
accurate for this type of simulation, Microcode was written 
for the Series 68 to examine the percentage of the cache 
that contained modified data. The microcode was executed 
every half second across several machines for several days 
of prime-time work A total of 500,000 samples were col- 
lected. The simulator also computed the same information 
for a simulated Series 68. Fig, 4 shows the two distributions, 



which agree closely. This provided the confidence needed 
to use the results of the simulator for estimating the perfor- 
mance impacts. 

The simulator results quickly ruled out the use of the 
microcode solution. It remained to calculate the pe 
mance impact of the posting circuitry. The Series 7fj 
simulated with different values of the posting threshold. 
The results showed minimal impact and the posting cir~ 
cuitry was included. 

The measurement tools also helped in the effort to debug 
the initial prototype. The programs used to validate the 
measurement tools were now used to validate the operation 
of the new cache. In one case, a failure that occurred only 
after hours of operating system testing took just seconds 
to reproduce with the validation software. This helped 
track down the offending bug quickly and make the pro- 
totype ready for further testing, In another case, when the 
simulator results were being correlated with the actual 
hardware results, there enntinued to be a significant, un- 
explainable discrepancy between the actual results and the 
simulator results. It turned out that a defective component 
in the hardware was causing a performance degradation, 
but not a fatal system error. Once the component was re- 
placed, the actual measurements correlated with the 
simulator results very closely. 

Benchmark Testing 

Benchmarks are repeatable workloads designed to exer- 
cise a system or system component, The benchmarks are 
first run on a standard svstem to establish a baseline. Usu- 
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Fig. 4. Comparison of simulated 
and measured results for the per- 
centage of the cache that con- 
tained modified data. 
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ally several runs are made, varying (he number of users, 
I lie amount of main memory, or other primary variables ol 
interest. The benchmarks an: then run again with the new 
system and results are compared. 

While a model predicts the effect of performance en- 
hancements^ benchmark can be used to measure the at inn I 
effect. The repeatability of the benchmark is a key advan- 
tage in measuring performance changes. Customer systems, 
while more realistic, have constancy changing workloads 
and environments. Hence it is much more difficult to ex- 
tract the effects of the new pro duct from customer systems. 

Benchmark selection and creation constitute a major ef- 
fort. 5 The value of a representative benchmark cannot be 
overstated, it is also valuable to include benchmarks that 
have extreme values of important system variables. Sen- 
sitivity to these variables can then be calculated and used 
to provide performance estimates for classes of customers, 
not for just the typical case. The benchmark results are 
analyzed in light of the model predictions. Together, a 
refinement of estimates can be made for customer systems. 

The benchmarks used for the Series 70 are designed to 
be representative of heavy interactive data base use. Data 
base activity represents the majority of work done on the 
Series 68. The benchmarks are created by taking a complete 
snapshot of a quiescent customer system .including a track- 
by-track dump of each disc. Monitors installed on each 
terminal record every keystroke. The system is then run 
normally for at least an hour. The traces of terminal activity 
are I hen turned into scripts for a terminal emulator. 

To run a benchmark, the system and discs are restored 
to look exactly like the original system. The terminal 
emulators are run on an HP 3000 Series 48 con nee led to 
the terminal ports of I he target syslem. The number of users 
can be varied by enabling or disabling emulators. Each 
benchmark is run for one hour. The repeatability of the 
benchmarks is typically well under a 1% change in trans- 
action throughput. 

The baseline for the benchmarks was an HP 3000 Series 
68 running the MPE VE operating system. The nu ruber of 
users and the amount of main memory were varied over a 
total of seven runs. The system then had the Series 70 
hardware cache added, keeping everything else constant 
(except for small changes in the microcode required to 
support the cache). The benchmarks were run again under 
the same parameters. All benchmark runs used both I he 
hardware monitor and the HPSnapslio I colled ion system. 

Several metrics are required to understand the effect of 
the Series 70 cache completely. The results of the bench- 
mark runs are shown in Table I. 

Table I 



Hardware Metric 


Before 


After Difference 


Hit Rate 


91.0% 


98.2% 7,2% 


Frozen Cycles 


34.8% 


11.6% 23 .2 


Paused 


8.0% 


14,0% 6.0% 


Software Metric 




Improvement 


Transactions/hour 




18.9% 


Transactions/CPU second 




32.3% 



The hardware metrics show the average measured values 
before and after the addition of the Series 70 cache. The 
hit rate of the Series 68 baseline benchmarks is lower than 
that of the simulation runs, and the Series 70 cache bench- 
marks show a lower hit rate than the simulations. However, 
the difference between the hit rales for the benchmarks is 
similar to that for the simulations. Since the performance 
increase is determined by the hit rate difference, the results 
are still significant, ''Frozen cycles" indicate the percentage 
of CPU time that is spent waiting for a cache miss. Essen- 
tially no useful work can be accomplished during this time. 
1 h" time thai the CPU is paused waiting for I/O has been 
factored out of this metric, The difference in the percentages 
ol frozen cycles is the primary metric of raw CPU per- 
formance improvement. Again, the simulation and bench- 
marks are in close agreement in the difference in the per- 
centages of frozen cycles between the Series BH and 70 
caches, The percentage of time that I he system was paused 
also increases with I he Series 70 cache. This is an indica- 
tion that the system can tolerate additional load with nearly 
I he same response time. 

The software metrics presented here use transactions as 
the basic unit n] measurement. Transactions per hour is a 
customer oriented metric. It is Era indication ol how much 
interactive work the system is processing. A transaction is 
defined as the work the system does between the time the 
Return or Enter key is pressed until the system replies and 
posts another read at the terminal. Transactions per hour 
cannot be averaged over the benchmarks, since each work- 
load has its own mis nl simple transactions [an MPE com- 
mand, for example] and complex transactions (a data base 
query). The improvement, however, can be averaged, The 
benchmarks show an improvement of 18,9% more transac- 
tions per hour with the Series 70 cache. The metric trans- 
actions per CPU second is based on transactions per hour 
but factors out the percentage of time paused. It is a measure 
of the improvement the system is capable of delivering 
when loaded with potential bottlenecks discounted. The 
32.3% measured improvement gave a high level of confi- 
dence that the Series 70 cache would be successful. 

The initial cache performance estimates were then 
reevaluated with this new data. Since these benchmarks 
are data base oriented and include up to 78 active users, 
the benchmarks tend to push the system fairly hard. Since 
busy systems will benefit most from the improved cache, 
the benchmarks were estimated to be slightly optimistic. 
The benchmark runs, therefore, led to changing the upper 
bound for performance improvement from 28.3% to 23,2% 

More extensive benchmark runs were made later with 
all of the Series 70 components. A careful study had been 
made beforehand to ensure that the components didn't 
fight over the same resource and that the performance in- 
creases would mesh well. The benchmark runs showed 
that the integrated Series 70 met or exceeded expectations 
under all the conditions tested. 

Field Testing 

Field testing is an integral part of the product life cycle. 
It tests all aspects of product functionality, including relia- 
bility and documentation. Field testing should also be an 
integral part of the performance engineering cycle. Mea- 
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Realistic Cache Simulation 



Trace- based simulators are heavily dependent on the traces 
used it ^ generally very difficult to collect memory rek 
traces for cache Simulators It is even more dirlic~ 
representative traces of the system being studied Alan Ja . 
of the University of Calif om Fa at Berkeley surveyed a large number 
of cache studies and found thai very few contained re 
traces T His article pc - maior prtfai's in memory refer- 

ence tractng 

1. A trace represents only a small sample of the workload. 
To combat this deficiency, many long traces are needed The 
traces must be long enough to overcome start-up effects of the 
simulator and also long enough to capture major perturbations 
to the cache Smith's study used 49 traces of varying lengths 
for a total of 13 5 million memory references Each trace in the 
Series 70 effort is one million memory references long Very few 
previously published studies have used traces of this magnitude 
One milhon memory references virtually guarantees the capture 
of all essential influences to the cache, in addition, less than 1% 
of the trace is used m the start-up of the simulator A total of 30 
million references were used in ihe Series 70 cache study. This 
amount of data is significant No study known to the authors has 
used as much data from a single machine The statistical analysis 
of the simulations also indicates that the 30 million references 
are sufficient to make good predictions 

2. Traces are usually taken only from the initial part of small 
programs. This is because the large majority of studies use 
program interpreters as the means of collecting the traces. The 
interpreters have a large overhead and can usually handle only 
small programs, and then only (he initial part of Ihe program's 
execution This is not representative of either the program or the 
system The Series 70 study used passive hardware to monitor 
the addresses, The sampling was random and covered all parts 
of the program execution for both large and small programs The 
random sampling ensured that the traces were representative 
& Most studies do not trace operating system code. Another 
result of using interpreters is the inability to trace the operating 
system However, it has been shown that operating systems nave 
rather poor locality compared to user programs Curreni operat- 
ing systems consume a major portion of CPU time, so their effects 
must be included for accurate representation Again, the passive 
hardware tracing, as used in the Series 70 analysis, auto matte ally- 
includes operating system effects This js a major contribution 
to the state of the art in cache simulation 

4. Task switches* which can greatly impact cache perfor- 
mance, are not adequately accounted for* Most simulators 



that use interpr- ^h the cache every so often to 

: : e the effects of task switching If the distribution of task 
switches is not wei ay develop 

Hardware tracing eliminates the guesswo rt isk switches 

are include 3 enough Each trace taken for 

the Series 70 study was analyzed and found to nave dozens of 
task switches. 

5. The sequence of memory addresses is dependent on any 
buffering implemented on the machine, und un the architec- 
ture of the machine. The traces taken for the Series 70 came 
from the Series 37, which has the same architecture as the Senes 
70 and no internal; buffering. While all architectures exhibit the 
same general cache behavior with respect to design parameters, 
absolute numbers lor cache performance must depend on hav- 
ing similar architecture and workload 

6* itO activity is seldom included in the traces. For thts reason, 
not much has been known on the effect of I/O activity on cache 
behavior However, the bus structure of the Series 37 permits 
Ihe collection of I/O references as they occur. Therefore, the 
simulations accurately model the effects of I/O on cache perfor- 
mance which leads to improved predictions. 

The traces taken for the Series 70 advance the state of the an 
in cache simulation The predictions of cache performance agree 
closely with measured results However, there ts still room for 
improvement The Series 37 used in Ihe tracing only had 2M 
bytes of main memory The Series 70 can have up to 16M bytes 
The Series 37 also did not have MPE disc caching enabled H 
Is anticipated that many Series 70s run with disc caching These 
deficiencies were recognized early and their effects were com- 
pensated by conservatism in the analysis. 

The Series 70 analysis spawned a more accurate hardware 
tracing system. It can record a million cycles of machine execu- 
tion in real time for many high-speed systems. Up to 144 bits 
can be collected during each cycle The system consists of ISM 
bytes of very high-speed RAM. a Series 37 acting as a controller, 
an HP 9144 cartridge tape drive for data storage, and interface 
circuitry to the system under test, all on a portable cart. It is 
currently being used with the HP 3000 Series 930 and is providing 
state-of-the-art measurement capabilities for this RISC-like HP 
Precision Architecture machine, 
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surements under real conditions an- ike final validation of 
the product. 

Measurements also help validate the whole performance 
methodology* Results of field testing are compared with 
the models to gauge the accuracy of the prediction tech- 
niques. Sometimes the measurements suggest how the 
models can te improved. Field measurements also validate 
th>- benchmarks, The performance increase shown by ihe 
benchmarks can he camp.m-d lo those seen in the Field tn 
show how repiesentative the benchmarks really are. 

Field testing does not always produce the most accurate 
data on the effects of a new product. The dynamics of 
] inn lnt tion systems in tin? field sometimes make it difficult 
io distinguish between new produi i effects and normal 



system variations. One way of overcoming this limitation 
is through long-term testing. Patterns of normal variation 
can be distinguished and accounted tor when many sam- 
ples are collected over a long time. 

Site selection for field testing resembles benchmark 
: i •■'. rmn. A mixture of representative sites and sites with 
unique environments helps to determine average perlor- 
tiuim (?i haracteristics and the sensitivity to site variations, 

Field testing completes the performance engineering 
cycle. The data collected in this step not only ^ives final 
results fur the new product but If done correctly, provides 
the customer characterization needed at the beginning of 
the performance eng bj iitg i \ i le ior future products. 

Performance testing of the Seres 7n i rii In- iii I he field 
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was a iruijur priority. Thirteen HP 3000 Series 68 sites were 
selected lo participate in the alpha and beta test phases. 
AN sites ran a series of HPSnapshot measurements, The 
characteristics include the number of users, pen entage of 
CPU busy, the type of work done, and the amoLuii ui main 
memory. These were then compared to corresponding av- 
erages of a large set of customers, Of the thirteen sites, 
seven were picked to receive the hardware monitor. 

The hardware monitors were installed on each site from 
two weeks to two months before installation of the Series 
70, and for two weeks to one month after installation of 
the Series 70- The hardware monitors were able to separate 
the effects of the cache from the rest of the system- Data 
was recorded for 23.5 hours of each day, in half-hour sam- 
ples, These baseline measurements were long enough and 
detailed enough to determine the variations caused by shift 
changes and weekends. In some cases monthly effects were 
noticed. Once started, the hardware monitors required 
human intervention just once a week, and then just to 
replace a flexible disc. The measurements had no effect on 
the systems under test. 

A total of 10.500 half-hour samples were collected over 
the seven sites. This represents over 5 billion cycles. Of 
the total, over 3000 samples were from prime time, repre- 
senting over 1.5 billion cycles, 

The difference between the Series 68 and 70 caches was 
calculated for several variables for each site. Table II sum- 
marizes the results. 

The calculations treat each site as one aggregate sample. 
The 90% confidence interval is a two-tailed t-test with six 
degrees of freedom, Statistically, it represents the range 
where the population mean is expected to be. 



Table II 
Prime-Time Field Measurements 







90% 






Confidence 


Hardware 




Interval of 


Metric 


Series 68 Series 70 Difference 


[Jifl'f*j-cnt;t j 


Hit Rate 


92.6% 98 7% 6.1% 


±0.35% 


Frozen Cvcles 


29.5% 8.5% 21.0% 


±0&l 


Paused 


34.4% 48.5% 14.1% 


±7,92% 



The cache hit rate improvement is consistent over the 
seven sites. Every site experienced significant improve- 
ment, The improvement is further confirmed by the de- 
crease in frozen cycles. Again, there was consistent im- 
provement over all the sites, it is expected that the average 
improvement over all customers will lie in the range of 
20,0% to 22.0% cycles recovered during prime time. 

Fig. 5 shows the progressive estimates of the difference 
in hit rate and frozen cycle percentages. Each point is the 
best estimate for the mean difference over all customers. 
The box surrounding each point indicates the 90% confi- 
dence range, Note that the hit rate confidence intervals 
were very narrow, indicating uniformity in the measure- 
ments. However, the effects of workload differences are 
apparent in the means actually measured on customer sys- 
tems. The confidence interval for the frozen cycle differ- 
ence progressively narrowed during the project, The added 
information at each step helped to define more clearly the 
performance of the cache, The final mean is within the 



HIT RATE AND FROZEN CYCLE DIFFERENCES 



BETWEEN SERIES 68 AND 70 



HIT RATE 

DIFFERENCE 



FROZE N 
DIFFERENCE 



90% 
CONFIDENCE 



Difference in Hit Rate % 




SIMULATION 



BENCHMARKS MEASUREMENTS 



Fig. 5, Differences between 
Senes 68 and Series 70 cache hit 
rates and frozen cycle percent- 
ages. 
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original confidence interval t thus validating (he estimation 
methodology. 

CPU pause time is often used as a measure of improve- 
ment. It is included here for comparison with the other 
metrics. While the Series 70 cache can add significant CPU 
capacity to the system. CPU paused is not a very stable 
statistic in most customer environmer/ 

The Series 70 cache not only exhibited the expected 
average performance improvement over all sites, but was 
found to be fairly insensitive to variations in workload and 
configuration. This can be seen in the distributions of the 
hit rates and frozen cycles in Figs. 6 and 7. 

The distributions of hit rates for the Series 68 and 70 are 
shown in Fig, 6. Not only has the Series 70 moved the 
distribution significantly higher, it has also narrowed the 
distribution, which is evidence of the saturation effect. The 
saturation effect reduces the sensitivity of the cache to 
different operating conditions. 

The distribution of frozen cycles also shows a significant 
change in the distribution (see Fig. 7). The difference in 
the distributions is the measure of performance improve- 
ment, The fact that the overlap of the distributions is very 
small emphasizes that the performance improvement 
applies across all sites. 

HPSnapshot studies and customer observations confirm 
the predicted performance improvement for the entire 
Series 70 product. The data collected is now being used 
as part of customer characterization for future members of 
the HP 3000 product family. 

Summary 

The HP 3000 Series 70 is the result of precise measure- 

SERIES 68 VS 70 HIT RATE IN PRIME TIME 

ALPHA A BETA SITES 



ments applied to an existing system. The current customer 
environment was extensively characterized and analyzed 
for performance opportunities. Potential enhancements 
were modeled and a set of enhancements was selected ihat 
best met the criteria of higher performance* lower cost, and 
short development time. The system design and the design 
of each component were analyzed and tracked through 
development. Benchmark tests were run on the prototypes, 
Finally, the product was measured over several months in 
the field to confirm the performance. The measurements 
also serve as input for future products. 

The Series 70 cache memory subsystem, as the major 
■ rmance component, was a direct result of careful mea- 
surements and analysis. The performance analysis has also 
advanced the state of the art in cache measurement and 
prediction.* 
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